OCR with Powershell

I wrote a little function that utilizes Microsoft Office Document Imaging (MODI) to retrieve text from images with OCR.

I have put a few notes in-line in the script and have dummy-proofed it somewhat, but ymmv! Below the snippet I’ll show an example where I compare 12pt font recognition with this technique.

Here’s an example:

Image Get-TextFromImage Output
OCR Test Image Windows Powershell NODI OCR Test Image
l2pt COURIER NEW ABCDEFGHIJKLMNOPORSTUVLJXYZ
abcdefghijklmnopqrstuvwxyz
01234567890 

12 pt TAHOMA
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789

12 pt TERMINAL
ABCDEFCH I JICLMNOPQRSTUUIIXYZ
abc de f gh ii Ic inn o pqrs t tw wxyz
0123456789

12 pt VERDANA
ABCD EFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
0123456789

12 pt CONSOLAS
ABCDE FCHIJ KL KNOPQRSTUVHXYZ
abcdefghij kirnnopqrst uvwxyz
0123456789

12 pt Times New Roman
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklnmopqrstuvwxyz
0123456739

] pt OCR?A Extended ABCDEFGHI JKL1NOPQRSTUVLdXYZ
abcde fghij kimnopqrstuvwxy z
O]3456789

The OCR-specific font failed miserably. Funny huh?

It appears that at 12pt in a jpg, Times New Roman is the best candidate for OCR using MODI via Powershell if you intend on having accurate results!

Relevant links:
http://stackoverflow.com/questions/316068/what-is-the-ideal-font-for-ocr
http://cerealnumber.livejournal.com/47638.html
http://stackoverflow.com/questions/9277571/how-can-i-retrieve-modi-reference-from-com-in-my-application

15,646 total views, 5 views today

5 thoughts on “OCR with Powershell

  1. Jeffrey Snover[MSFT]

    Howdy Rex!

    I didn’t realize that you could do this – that is cool. I was looking at your script and you might consider making a change. They PowerShell convention for passing in files is the -PATH parameter. We also have a number of [VALIDATE…] attributes which do the work for you and then you get both standardized error messages and we’ll translate them when your script is running in other countries. Here is my suggestion:

    [Parameter(Mandatory=$true)]
    [ValidateScript({test-path $_})]
    [ValidatePattern(“\.jpg$|\.jpeg$|\.bmp$”)]
    [string]$Path

    Give it a try and see if you like it.

    Jeffrey Snover [MSFT]
    Distinguished Engineer and Lead Architect for Windows Sever and System Center Datacenter

    Reply
    1. Rex Hardin Post author

      Hi Jeffery –

      That’s a very valid suggestion! I’ll update the post in the next day or two. I suppose I ought to production-ize scripts/snippets I publicize, huh? 😛

      I’ll be posting more interesting stuff on /r/PowerShell – keep an eye out! I have a small backlog of nifty things I’ve learned/encountered and have been meaning to blog about.

      Thanks!
      Rex

      Reply
  2. tostaky

    Thanks a lot, i wanted something like that, it could be perfect, but sometimes it doesn’t work cause of colors.
    Is it possible to custom the precision and be able to read some picture text in color ?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *