A method to machine read messy early manuscript sources and to automatically create clean structured historical data.

I have been experimenting with automatic handwriting recognition software called Transkribus. This is applied to the transcription of letters and handwritten notes, but not so much to database creation from historical sources.

The image above shows a training model for text recognition I am building for the port books of Bridgewater 1672-77.

Port books contain entries for millions of English and Welsh merchant voyages 1565-1790. The script has in the past been very hard to transcribe meaning most of the data contained in this source remains unstudied. Can Transkribus help to unlock the vast repository of information about trade, shipping and consumption in early modern Europe contained in this source and others?

I have now created data using Transkribus for Newcastle shipping movements from the 1590s with 11,000 observations. This work can be seen in the following slides.

Please see my slides describing the results and method developed:

Handwritten text recognition

The following text describes some of the elements:

Transkribus and database creation



