Joost van den Vondel's correspondence

Screenshot showing an impression of the Vondel correspondence website

During my university years, I had a side job helping a research team digitize and clean up the letters of the philosopher René Descartes. It was an interesting project, but the tasks involved were a bit boring.

For my latest experiment with GenAI, I evaluated if GPT-4 could automate the manual tasks involved in this process. I did this by creating a digital edition of the correspondence of Joost van den Vondel, widely regarded as the greatest writer and poet in Dutch literature (comparable to Shakespeare in English literature).

The AI performed several tasks:

It converted the scanned PDF pages to text
Split the text into individual letters
Extracted sender, recipient, location, summary and date details
Translated the letters from French, Latin, and 17th-century Dutch into modern English

The results can be viewed here. The website is a simple static site that displays the letters in a list. Each letter is clickable and shows the text as generated by GPT-4o, the translation, and the extracted details.

As always when working with Large Language Models, human oversight remains an important factor in each step of the process. It would be interesting to see how that oversight could work in this setting, but that's beyond the scope of this side project.

The code for this project can be found on Github. It's a collection of Jupyter notebooks and a simple static website generated by Next.js.

This experiment confirms my earlier observations about Large Language Models: they can be simultaneously impressive and flawed. The LLM's ability to translate Dutch, French, and Latin into modern English was remarkable. Additionally, its extraction of dates, summaries, and geographical locations was really cool. However, no amount of prompting could make it understand how to split the text into individual letters.