Albert Witteveen - Librelivre.nl PDF to Epub conversion on a GTX 970
Abstract
The Dutch national library has website where you can download PDFs of old books. The PDFs are scans of the paper books. This is not the best format to read the books, especially if you own an E-Reader and want to use that. In 2025 it should be possible to use AI for this right? This talk is about the exploration how well AI can already do this. It is actually possible to some degree to transfer those scans to readable PDFs on your local computer, even on a lowly GTX970 or with just the CPU. However there are definately caveats. So I wrote some scripts to help with the conversion. The results of the scripts definitely need manual cleanup after, but the final result is a proper Epub. There are even some scripts to modernise the spelling of the documents. The base is a tab delimited file which I keep an a public gitlab site which grows each time I convert a PDF. All can be found on Gitlab with an accompanying website at Librelivre.nl.
According to the Dutch copyrights law, 70 years after the death author they enter the public domain. Unfortenately the terms of use of the website of the national library will not let you use them freely until 140 years after the publication. We will discuss this as well.
So in this talk, the state of AI to convert PDFs to Epubs will be discusssed, the viability to do this on your own hardware including results of performance tests and legal matters will be discussed.
