Tutorial: Everything you always wanted to know about Tesseract

This tutorial will cover the algorithms, design and implementation of the open source OCR engine known as Tesseract.


Designed largely in secret, the methods used in Tesseract are not well known, yet it remains a formidable force in OCR, and continues to improve. The layout analysis was second in the 2009 ICDAR competition, it supports more than 60 languages, including Chinese, and several Indic languages, and recent changes have allowed easy plug-in of new classifiers.


This tutorial will lay all the cards on the table, covering the following topics:


Optional Hands-on opportunity:

This tutorial aims to provide a hands-on experience in which you get to build and run the latest Tesseract on your own machine, to follow along with the demos, and possibly even make your own modifications! Bring along your own laptop with the following configuration to take part: