Foxtrot offers an in-built OCR action that is simple, yet very powerful. The OCR action performs Optical Character Recognition (OCR) on either as target on the screen or an image opened in the Foxtrot Image Editor. This technique is especially useful when the Get Value action is unable to extract the needed text from the target.
The first time you run the OCR action, an installation is required, which will automatically start. Complete this, and your Foxtrot will be able to run OCR actions going forward.
The OCR action is able to OCR a specific area of a target or image to return the characters with very high precision. Furthermore, you are able to use grayscale and adjust the color settings to get the best result possible.
However, if you are looking for an OCR solution with more options, you could consider either using the Tesseract OCR engine as explained in the article below or you may consider using, for example, the Google Cloud Vision API as that offers a very powerful OCR engine. The appropriate method of OCR fully depends on the type(s) of targets/images/documents you are going to work with. It is very important to note that OCR is an extremely difficult process to master and most people have to compromise with their expectations.
- How-To Use Tesseract OCR (Open Source Google Engine)
- How-To Use Google Cloud Vision API (OCR & Image Analysis)
If you are working with PDF files and looking to OCR the content of the document, please make sure that you have read this article:
Simple OCR
Whether you are perform OCR on a target or an image, the concept of using the action is the same. However, generally speaking, it is recommended to use images as you will be able to use additional tools such as cropping the image before performing the OCR, which will help you achieve optimal results. If you need to OCR something on the screen, you could target it and use the Screenshot action to save a screenshot of the target directly to the Image Editor in order to OCR the image.
For this simple example, we target a window of the Notepad application and select the Screenshot action to save an image of the target to the Image Editor.
IMPORTANT: You definitely want to use the Resize Window action using "Exact size" before taking the screenshot of the application/window you wish to OCR. Why this is important will be clear during the article.
Now you will have the image in the Foxtrot Image Editor.
To perform OCR on the image, simply go under "Images" in the action list and select the OCR action.
By default, the whole image will be selected as the area to OCR indicated by the purple line.
To deselect everything, something we recommend always doing before specifying your desired area to OCR, simply right-click on the image.
To maximize the image, something that is especially useful for bigger images, click the expand button in the top-right corner.
You can now, in the expanded window of the image, drag-and-drop the mouse to select the area to OCR like this.
At this point, this is the most difficult part of performing OCR using the Foxtrot OCR action. Why? Because you need to consider this key aspect:
- You want to be as precise as possible in your selection of the area to get the best result possible, however, can you be sure that the text is always in the area you selected? Can it possibly move?
IMPORTANT: This is why you definitely need to use the Resize Window action using "Exact size" before taking the screenshot of the target. Because, imagine that your Notepad application is the below size next time you take the screenshot, then you will definitely not get the desired output.
It is important to be aware of the fact that the more precise you are in your selection of the area, the better result you will get. For example, the below selection will not be optimal as you will have a selection that includes different sized fonts and a symbol in the top-right side.
You want as much as possible to OCR areas containing the same font, size, and type of characters. So, for example, if you are interested in the title of the Notepad document, you should NOT do this:
The OCR action will attempt to recognize everything in the selected area. So it might interpret the icon as a character by mistake, similarly, it will also attempt to recognize any characters in the top-left corner, which could potentially be interpreted as an "I". So, you need to be very precise and make a selection like this.
Notice how the selection is further out to the right than the actual title. Why? Because imagine if the title of the document changes to something longer than what it current is. We want to make sure that we get everything even if the title is longer than it currently is.
When you are satisfied with your initial selected area, you can click OK to return to the main page of the OCR action. Here, you can hit the refresh button to load the preview of the output.
So after running the refresh button, the preview will be populated.
If you are not satisfied with the output, you can adjust the selected area until it works optimally. Alternatively, you can work with adjusting the color-settings. By hitting the toggle button, you can change the image to black and white.
Using the slider, you can adjust the image layout to find the optimal settings to achieve the best OCR possible.
When you are satisfied with the output, you specify the variable to save the output to and you will be able to run the action.
After running the action, the variable will be populated with the output of the OCR.
OCR PDF files
So far, we have looked at how to OCR a simple target like the Notepad application. Performing OCR on documents is basically the same, however, it is typically more complicated to get right as documents often contains more text and the layout+quality varies.
To OCR a PDF file, you first need to convert the PDF to an image. To do this, we recommend using the Poppler utility library. At the end of the article, you learn how to use the "pdftoppm.exe" program to convert PDF files to images.
For this article, let us take a look at two different PDF files (converted to PNG) to see how we could you the OCR action to retrieve data from them. Both of the files are available at the end of the article.
Here is the first one, "INV0001.png".
And here is the second one, "Image_PDF.png".
There is an obvious difference in the quality of the two documents, however, both of them offers challenges. The first document contains colors, some changing font sizes and weigths, and there are some lines separating some of the text. The second document is quite pure in quality, the fonts also change and it is heavy in the amount of text.
OCR invoice sample
Let us start by taking a look at the invoice sample document. After opening the image using the Open Image action, we can create a new OCR action.
The first thing we would want to do is to change the OCR action from color to black and white as the different colors might complicate the output. You can always change it back if the output is worsened by the black and white. Next, we need to consider what we wish to extract from the document. Now, typically we want to extract everything, right? But, is it a good idea to just select the whole image as the area?
Notice the preview in the above screenshot. The OCR will not be layout (columns, etc.) aware, therefore, this is not a good idea. Also, as mentioned previously in the article, it is quite important to be precise in your selection of the area to OCR as that greatly improves the chances of getting a correct output. Therefore, the best approach for this and similar documents would be to make several OCR actions that handle different areas of the document that you can store to different variables for further processing. This will both make the quality better and make it easier for you to know what the text is about - if you simply OCR the whole document, you might not know where what sentence originate from. Below is a series of example screenshots of how this sample invoice document could be approached.
Now, it is crucial to keep in mind that you have to be aware of the fact that the action is fully position-dependant. Everything is based on the exact coordinates of your selected areas. So, if your document is scanned documents from, for example, vendors, then the size of the document or the position of the elements might change. Therefore, it is very important to have that in mind. That is also why it is sometimes necessary to use a different OCR tool as mentioned in this article.
OCR sample letter
Lastly, let us have a look at the second sample document that is quite different.
Even though the quality of the document is quite poor, Foxtrot is actually capable of perform impressive OCR. Again, we should break down the document into segments. First, we select the top header of the document and refresh the preview. Of course, you could even select one line of text per OCR action if you think that is more appropriate. Notice how it is actually able to perfectly read the characters.
Next, we move on to the first of the two columns above the actual body of text.
We should do the same of the second column.
Lastly, we can select the whole area of the body of text (everything except the footer).
And that's it! That is how you can use the OCR action in Foxtrot to perform OCR on targets, images, documents, etc.
Comments
0 comments
Please sign in to leave a comment.