PyCon X


2nd - 5th May 2019

In Codice Ratio: Machine Transcription in the Vatican Secret Archive

In Codice Ratio is a research project to study tools and techniques for analyzing the contents of digitized historical documents from the Vatican Secret Archives (VSA). Being digitized as images, their text content is still unaccessible without expert human intervention: transcription is therefore a key enabler for search and automation of knowledge discovery on such large collections. Handwritten documents are particularly challenging, as traditional OCR does not apply, and state of the art handwritten text recognition systems require very large and expensive to obtain datasets. ICR’s transcription system is based on convolutional neural networks and statistical language models, and requires minimal dataset collection effort.

