Tutorial: Building Scalable Solutions for Document Retrieval and Recognition.
Scalability
of a given solution is an important consideration towards enabling
retrieval and recognition over large collections of document images.
However, the definitions of scalability are fast changing with the
emergence of huge datasets and digital libraries, as well as the advent
of new computing paradigms. In this tutorial, we shall cover three
approaches towards building scalable document image retrieval and
recognition systems:
- Recognition-free retrieval using bag-of-visual-words
- Recognition of word-images using indexing schemes
- Large-scale testing/deployment using cloud computing
This
tutorial shall include a parallel hands-on practical session, where the
attendees would have the opportunity to practice the methods described
in the tutorial. A dataset, along with the necessary code libraries,
will be provided to the audience. Multiple solution stacks shall be
deployed and evaluated by the various groups/individuals, with a
scalable retrieval system being built by the end of the tutorial.
The target audience of the tutorial are:
- students
that would benefit by obtaining a broad overview of the state-ofthe-art
approaches and access to code libraries to apply over their datasets
- researchers
who might appreciate that scalable solutions can be built in practice,
with slightly unconventional approaches that were possibly discarded or
overlooked earlier
- practitioners that might be interested in picking up know-how and tools for quick deployment of document retrieval solutions.