Document Areas are datasources based on data extracted from a PDF document that can be used as all other datasources for further processing, in print objects, or to make comparisons or decisions inside the program.
Area datasources are based on an area you specify and may contain text, graphics, or barcodes. When using text or barcodes as a source you have the ability to use pattern matching to search for specific data within larger areas, replace parts of the text, and format the result. The captured area and the result can be previewed in the Document Area dialog.
You create a document area by selecting Document Area on the ribbon and then clicking Add Document Area. The cursor will then change to a cross and you can use the cursor to draw a rectangle on the document where you want to capture the data. The defined area will be overlaid with a colored shade and the Document Area dialog will open where you can preview and name the defined area.
In the Content list, you can choose the type of data you have selected:
Text: This selection type allows the document text to be interpreted as text and be used in later processing.
Graphics: This selection type allows analysis into changes between images of the defined area from page to page in the document. It should be noted that text held in the image is not interpreted using this mode.
Barcode: If you select this area type, the program will try to decode the image as a barcode. The program will automatically detect the type of barcode and return the de-coded image as a text string.
The following barcodes are supported:
1-D (Linear) barcodes: Code 11, Code 39, Code 39 Ext., Code 93, Code 128, 2 of 5 Interleaved, Codabar, Patch Code, and EAN-8, EAN-13, UPC-A, UPC-E (all with 2 and 5 digit supplements)
1-D Postal barcodes: USPS PostNet, USPS Planet, USPS Intelligent Mail Barcode, Royal Mail RM4SCC, and Australia Post.
2-D barcodes: PDF-417, DataMatrix, QR Code, and Micro QR Code.
Please note that in some PDF documents, barcodes are based on a barcode font. In that case you can simply select Text as the area type to get the coded data.
If you want the application to search every page, first page, or last page in your document for this area, select the desired option from the Type drop-down list. First and last page areas cannot be used until a document break has been defined. A document break indicates how to detect the end of a document in the document file.
When you have captured text or barcode data from the document, you can use pattern matching, replacements, and formatting to manipulate the captured text and get the result you want.
You can enter a pattern to search for in the Pattern filter box. The program will then try to locate a string with the pattern you have defined within the captured text.
For example, if you want to find a number of 6 digits you can enter \d{6} as the pattern to seach for, or you can simply enter Total as the pattern filter if you want to look for the word Total.
For more information on patterns, see Regular Expression Language Elements.
You can use the Replace text box to enter a replacement pattern. Using this you can replace a part or total of the matched string.
For more information see Replacement Patterns.
The Format text box gives you a simple to use way to format the output. It allows you to make the string a fixed number of characters and arrange alignment.
For example a format string of ######## will format 1234 to 00001234, or a format string of ###-### will format 123456 to 123-456.
For more information see Area Formatting.
The result of pattern matching, replacements and formatting can be previewed in the Preview box of the dialog. in most editions of the program, you can also check the results for each page of your file on the datasheet.