Optical Character Recognition

2013-07-13 ocr

What is OCR? It’s short for optimal character recognition. ICR is a form of OCR but meant for reading handwritten characters. OCR is essential in any business process where documents are scanned and extracted for text or metadata in order to classify them.

From five years of experience with OCR engines, the following are the recommendations in order:

  • Tesseract Free and open sourced by Google
  • OpenText RecoStar (Capture Recognition Engine) + Design Studio
  • Abbyy - They also provide a screen reader for screen-scraping terminals
  • Nuance Omnipage There are also open source, free OCR engines such as Tesseract, which was released by staff from UNLV  oddly enough.

Surprisingly, there are many modern libraries that do on-the-fly OCRing on mobile phones. Such technology has many benefits for both the consumer and business user.

To see an implementation of the Tesseract OCR engine in Go, see my OCR Engine demo.