How-To Use Tesseract OCR (Open Source Google Engine)

Follow

Comments

6 comments

  • Avatar
    James Brammer

    I mean where right now it is returning coordinates for each word found, but can it bed to discern whole lines of text?

    If a screenshot has multiple lines of text, including this one: "Cross Application Transactions Enter/Update"

    It will return the coordinates for "Cross", for "Application", for "Transaction", etc. with the X, Y, width, etc. for each.

    Can it be told to treat whole lines of text as one segment, instead of returning a result for each word in that line of text?

     

    0
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    I see. Well, the standard command for Tesseract (the one not returning coordinates) will give you the full text output with linebreaks, etc. There is not such a command build into Tesseract that returns the coordinates of full sentences/lines of text, so you would have to make the logic yourself.

    But, it shouldn't be too hard to do. You can use the Query List action to get rid of, for example, all blank rows in your list to begin with (so you only have the actual words detected). Then, you could use either the value of the "line_num" column or the "top" column to determine when you get to the next row of text. There are also other ways to do it, but that's the general idea. It's not super smooth, but that's the options offered by Tesseract.

    0
    Comment actions Permalink
  • Avatar
    James Brammer

    This is an awesome example writeup and sample script, thank you! Made template-izing it very easy.

    I've found a good trick for making the findings more accurate by eliminating clutter and cutting down on the amount of work Tesseract has to do, plus helping avoid false positives if a word could potentially be found in more than one place--

    Get your initial screenshot with the pre-determined window size and location, then use Foxtrot image crop on all four sides using variables for each (CropLeft, CropRight, CropTop, CropBottom) to keep track of what dimensions the original was.

    Now during or immediately after retrieval of the Top and Left variables, just add values of CropTop and CropLeft back into each to get the true screen position.

    0
    Comment actions Permalink
  • Avatar
    James Brammer

    Is it possible to have Tesseract return text as whole lines/rows from the screenshots, or will it only return individual words? For a multiple word hit I have ended up adding coordinates together and averaging them, and that worked, but seems like a clumsy way to do it.

    0
    Comment actions Permalink
  • Avatar
    James Brammer

    Line_num! That one got past me. Awesome, I see how to do this now.

    0
    Comment actions Permalink
  • Avatar
    Mathias Balsløw

    James,

    Great input! In terms of your question, I'm not sure I understand your question. Are you asking if it is possible to return all text from a screenshot or? That's what is illustrated and explained in the first part of the article. But are you referring to something else?

    0
    Comment actions Permalink

Please sign in to leave a comment.