DAS2014_tutorial1

Tutorial: Building Scalable Solutions for Document Retrieval and Recognition.

Scalability of a given solution is an important consideration towards enabling retrieval and recognition over large collections of document images. However, the definitions of scalability are fast changing with the emergence of huge datasets and digital libraries, as well as the advent of new computing paradigms. In this tutorial, we shall cover three approaches towards building scalable document image retrieval and recognition systems:

Recognition-free retrieval using bag-of-visual-words
Recognition of word-images using indexing schemes
Large-scale testing/deployment using cloud computing

This tutorial shall include a parallel hands-on practical session, where the attendees would have the opportunity to practice the methods described in the tutorial. A dataset, along with the necessary code libraries, will be provided to the audience. Multiple solution stacks shall be deployed and evaluated by the various groups/individuals, with a scalable retrieval system being built by the end of the tutorial.
The target audience of the tutorial are:

students that would benefit by obtaining a broad overview of the state-ofthe-art approaches and access to code libraries to apply over their datasets
researchers who might appreciate that scalable solutions can be built in practice, with slightly unconventional approaches that were possibly discarded or overlooked earlier
practitioners that might be interested in picking up know-how and tools for quick deployment of document retrieval solutions.