Can OCR be used to find a spot to click on?

Answered

Comments

16 comments

  • Avatar
    Mathias Balsløw

    Hi James!

    Here's is a short answer. I will write you a much more detailed answer tomorrow when I'm back in the office.

    To answer your question with a yes or no, yes, this is possible. Now, whether OCR is your best option based on what you describe, I'm not sure.

    In your situation, you have three options:
    - OCR (scan the image to extract the position of all characters, then based on that, click)
    - Pixel Matching / Image Recognition (based on an image of the button or label, locate it on the screen and click)
    - Use advanced automation to find the element and click

    There are still some applications and systems where Foxtrot is not able to find every single element in the window and offer you rules to sufficiently find it. We are most definitely working on that with the new UIA technology that is currently being improved and expanded to more applications. In the meantime, there are the above three options.

    I've made three solutions based on Python code to offer users some extra functionality currently not embedded in the Foxtrot engine. These offers image detection and position-based commands:
    https://support.foxtrotalliance.com/hc/en-us/sections/360003453951

    I know the documentation for the solutions is quite limited, it's on our to-do list to significantly improve it!

    Then there's the OCR option. Foxtrots OCR engine is currently not able to do what you are requesting, but that's something you could do via, for example, Google's OCR engine. That returns the exact locations of the recognised characters and then you could use the position-based click from above Python solution to click wherever you want to click:
    https://support.foxtrotalliance.com/hc/en-us/articles/360024282351

    Then there is the third option with advanced automation. I actually have a third Python solution ready where I just need to upload it and write the instructions. I'll post that for you tomorrow.

    So to sum-up, yes, this is possible. But it is rather advanced depending on the situation, if you could upload an example screenshot of the app, that could help us guide you.

    1
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    Hi James,

    As a follow-up to my comment above, here is a new Python method that might be a better solution for you than OCR:

    https://support.foxtrotalliance.com/hc/en-us/articles/360024967691

    1
    Comment actions Permalink
  • Avatar
    James Brammer

    Thank you Mathias, I'm going to give this a try.

    0
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    James,

    If you need any additional assistance, please let us know. I know my first answer is kind of vague, but it is hard to give a concrete guide without knowing your application and specific issue. I hope it was clear based on my answer that it is definitely possible, however, you might need to utilize some supplementary tools to achieve it, whether it is any of the ones I mention or something else.

    I also can let you know that we are currently finalizing three quite significant actions that will allow you to, similar to the VBScript action, write your own code to be able to do more custom actions:

    • Python
    • VB.Net
    • C#

    Also, we are investigating how we could improve the current OCR action to allow you to scan not a specific area in the window, but rather scan the whole window and retrieve the position of a specific set of characters - exactly for situations as the one you describe.

    1
    Comment actions Permalink
  • Avatar
    James Brammer

    pywinauto_Script looks like a great solution to a lot of trouble spots I've run into in the past, and I was able to get a few test clicks to work in simple UIs like Notepad, but cannot seem to get it to click on what I want in the Jack Henry Xperience system when I do manage to get it to click inside Xperience. I see in some of your recent release notes that Xperience support has been expanded, maybe a support case for Xperience interactions is what I really need here because I am on the latest release and am still having trouble with Xperience.

    Either way this is an excellent additional tool to have thank you.

    0
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    Hi James,

    Definitely. Xperience is mostly an American product that Enablesoft has been specifically developed enhancements to the targeting technology to fully support the application. So in this case, of course we could help you by troubleshooting your issues when using the pywinauto_Script, but the best thing would be to contact the Success Team at Enablesoft (success@enablesoft.com) to take a closer look at your specific issues so that the development team can solve them.

    0
    Comment actions Permalink
  • Avatar
    Uffe Sørensen

    Hi, I am looking to find a solution for more or less the same thing as James is explaining above. As per Mathias' replies, I guess the only way I would solve my issue is to use the OCR solution, but I have no option to install any software on my Citrix environment. Did anyone come up with a different solution for this issue?

    One thing that I was actually hoping for, was to have the build in OCR in Foxtrot give the "right" output when it comes to line breaks. this way I would be able to make rules saying if the word I am looking for, is in line X, then the coordinates are X, Y.... As you can see from my below screenshots, the variable does not always show the line breaks and this makes it difficult for me to figure out which line my "word" is in.

    So, let's say, I am looking for the first word mentioning "DOCUMENT" I wanted to be able to count the rows down to "5" (by replacing the line break with some kind of symbol, and then count the symbols left of DOCUMENT, and by that have a list with coordinates, BUT... as you can see from the OCR value in the variable, Foxtrot is not recognizing the line breaks every time. :o/

    I hope that someone can help me with this issue.

    0
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    Hi Uffe!

    Probably already tested this, but does shortcuts work in that menu? For example, can you use the arrows to move up and down, the right and left arrow (or space) to expand and collapse the subfolders, etc.? If you can do that, and if you can use Ctrl+C and the name of the current folder will be in the clipboard (that doesn't always work, but in some cases it does), then you could make a logic without the need to use the mouse or OCR.

    OCR is not very well-suited for this case as the menu contains symbols, the folder icon, dotted lines, all something that makes it difficult for Foxtrot OCR (or any other) engine to correctly detect the characters.

    I know your colleague Ghita did have a similar question with a similar if not the same menu and she did find a solution, I think, using some existing actions in Foxtrot.

    I know you mention that installing new software is not possible - but if you could, either Python or this solution that I've made based on Python could probably solve this issue: https://support.foxtrotalliance.com/hc/en-us/articles/360024967691

    0
    Comment actions Permalink
  • Avatar
    Uffe Sørensen

    Hi Mathias

    Thank you for your reply. The menu does allow up, down etc, but it does NOT allow Ctrl+C…

    Ghita had another script, where the folder is always the same, so she did not need to know where a certain folder is located in the menu. In this case, I need to find the folder DOCUMENT, in the first loop, maybe I need the folder CORRESPONDANCE in the second loop and so on. The need for folders depend on other things, and they vary from loop to loop.

    As for the article you refer to, do you mean that I could then use that with command "location" and find the "X, Y"? I have not worked with the DOS actions in Foxtrot yet, but I believe Ghita has already used one of your similar solutions, so I will get a little help there I guess.

    Thanks

    Uffe

    0
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    Hi Uffe,

    Okay, I completely understand the issue now.

    The link I refer to at the end is an article where I have written some code that allows you to locate and interact with elements (link the folders in your screenshot) that Foxtrot is not necessarily able to detect when you do drag-and-drop. Of course, I can't say for certain that it is able to do it, but I would expect it to be. The script locates the elements on the screen and clicks on it - you do not need to worry about X and Y coordinates.

    But, from what I can understand from Ghita, even though an installation is not required, you are not allowed to download and use the solution from the link as it is an .exe file. So I guess that this will not help much, unfortunately, unless you can ask your IT whether you can get permission to use it, or optimally to install Python.

    If none of that is possible, it would probably be best that we connect with you to take a closer look - maybe we can help you to find a good workaround that solves it. Let me know what you think.

    0
    Comment actions Permalink
  • Avatar
    Uffe Sørensen

    As per Ghita she got the rights now to run scripts with the help of the exe-files you provided. Only we are not allowed to install python, tesseract or any other applications.

    I need to drag and drop, not click, so I guess I need the coordinates for that…. or?

    Thx

    Uffe

    0
    Comment actions Permalink
  • Avatar
    Uffe Sørensen (Edited )

    Actually I would always be able to move the "marking" to the folder I need by up-down actions, so if there was a way to search for the BLUE field and get the coordinates, I guess that would work. So it would mean that somehow search for solid blue pixels, and return with the coordinates, but I am not sure if that is even possible...

    Uffe

    0
    Comment actions Permalink
  • Avatar
    Markus Dalsgaard Sisseck

    Hej Uffe, hvis du har tid i dag, så vil jeg gerne prøve en løsning med dig. Har du tid til at smutte forbi vores kontor på skolegade 5?

    0
    Comment actions Permalink
  • Avatar
    James Brammer

    You might still be able to find out when you are in the right folder with the built-in OCR if you have some subfolders that will always exist in the same parent folders if you use the subfolders to identify when you have found the right parent folder.

    • click in the search box
    • tab to the first folder and expand it to see its contents
    • screenshot it then crop the screenshot a fixed amount so the only full words you see are the subfolders
    • OCR the cropped screenshot
    • if the OCR result contains all of these (Name1 and Name2 and Name3 and Name4) that are always in the folder you want, then you have the correct folder
    • if not, collapse the folders, start over, and and expand the next folder and OCR it, until you get a hit.

    that is very inefficient of course, but Foxtrot has plenty of free time on his hands.

    0
    Comment actions Permalink
  • Avatar
    Ghita Fjorback

    Hi James

    We have more than 2000 emails that needs to be archived in these folders on a daily basis, so time is actually a factor as well. :o( The idea is good, but I am afraid it will take Foxtrot too long time to archive all our emails, as the script itself also uses time on opening attachments, checking them with OCR etc.

    I was trying to help Uffe, and I believe actually that the best solution would be to, by the help of the up, down and right arrows, would maneuver to the correct folder. Once we are at the right folder, the folder will be "marked with a blue square", and this is where I believe I would be able to use Mathias' pyautogui_Image solution. By having a .png file with a piece of the blue square, I might be able to find the coordinates, or maybe even drag&drop directly to the folder.

    So far I haven't had any luck getting the picture taken of a piece of the blue square, as my snipping tool won't accept this "small" caption, but I keep on trying. Hehe... I might even need to capture a piece of the blue square together with some of the white, but the issue is then that the drag and drop will always end in the center of the caption, and that would be outside of the folder, so I might need to get the location X, Y instead, and then make a calculation, that would make sure that the location is right, and THEN make the drag&drop to the location instead, but I am just not sure if the pyautogui have a solution to return the location X, Y in a variable!? Does anyone know? I see that there is a command called "location", so it might be that...

    Ghita

    0
    Comment actions Permalink
  • Avatar
    James Brammer

    That makes sense, but I don't understand why you can't use the pre-compiled Tesseract from @Mathias for finding XY of text? If you are allowed to run executables that Enablesoft hands out then it ought to work for you since it does not require running an installer, and does not require that you install Python since it has already been compiled to run from an exe. I could be wrong about that last part, but I didn't install Python on my Foxtrot workstation before running the executable that I know of.

    0
    Comment actions Permalink

Please sign in to leave a comment.