
I have to admit, I love converting my old books to a digital format. I generally covert to EPUB as it seems to be a standard for digital publishing. EPUB easily converts to other formats (such as mobi for the Kindle) using calibre.
Here’s how I currently convert.
First I find one of the many old paperbacks I have. I start with a favorite, one I won’t mind reading again. And I have to read it again, the output has to be be proofread and corrected.
Once I have it selected, I break out my
Canon A590. I hold the book awkwardly as I try to snap a shot of a flat page. Then I turn the book or the page, depending, and take the next shot. The last book I worked on was short stories, so I only processed a story at a time. Snapping pics of a curling book is a crazy way to do it, but until I get my book scanner completed it will have to do! Mine will be based on one of Daniel’s older designs.
Once I’ve got everything converted to pixels, I open Scan Tailor, a free “interactive post-processing tool for scanned pages.” Scan Tailor is invaluable for moving the project along from pictures to something that can be OCR‘d.
After running it through Scan Tailor, I’ve got .tiff files that can be easily read by the OCR software. I like Abbyy FineReader, although my version does not have the EPUB option that can be found in version 11. Abbyy does a good job of recognizing the text, and highlights items with which it had difficulty. These difficulties are mostly from the way I held the book and the camera at the same time. Abbyy let’s me edit the text to fix these issues.
However, Abbyy is not a full-fledged word processor; I usually end up saving the output as a plain text file (.txt), then opening that with Microsoft Word. Abbyy does have an option to save directly to a Word file, but I find the output needs additional and unnecessary work to clean up the formatting. Something I don’t need to worry about with a text file. However, with a text file, any special formatting is lost. For me, the special formatting is mostly italicized words. I usually do a scan through the book later in the process to find these words. Or go back to the Abbyy file. Or even save one version from Abbyy to Word and use that to find the formatted words using the advanced search options.
I find Word the easiest way to work with the document, as I’ve used Word for years. I can often spot things like unintended line breaks, extra spaces, etc. Also, Word’s spell-checker is easy and fast.
From Word, I’ve found the best export option is copy/paste. I’ve tried saving as HTML, but oh my, Word adds a ton of HTML code to each and every paragraph. I like my HTML to be minimal. From my clipboard, it’s easy to paste into Sigil, an excellent (and free) EPUB editor. I could also open the
text file directly from Sigil, but mine always open as one big paragraph instead of breaking the paragraphs as it should. If there’s any final formatting, this is where I like to do it. “Sigil is a multi-platform EPUB ebook editor” (from their website) and it works really well.
Once I’m satisfied with my EPUB book (Sigil has a nice option to validate your EPUB format, as well as a ton of other useful features), I bring it in to calibre. calibre is a wonderful program, I use it to organize all my ebooks in addition to using it for conversion purposes. I also prefer calibre’s metadata editor over Sigil’s.
So that’s how I convert to digital. Feel free to chime in with your methods in the comments section.