Automatically create chapter breaks and titles from plain text file?

Question

I have a big file that is separated into categories with an underscore beneath each category name. The file is constantly changing and there are 80 categories. It is a plain text file. I would like to make it so that each category name is a separate chapter in an epub file. I would also like to make sure each chapter name is the title of the chapter. Is there a way to do this automatically with Calibre? Maybe some regex magic? For example, I would like the chapter titles for the categories below to be: Fruit, Vegetables, Herbs. And I would like it to be parsed automatically (one way to do this would probably involve recognizing the underscores in a regex expression). How can I do this?

Fruit
________
Apples
Bananas

Vegetables
____________
Cucumbers 
Zucchini

Herbs
_____
thyme
cayenne

score 4 · Answer 1 · answered Nov 27 '12 at 09:49

So your text file is basically an almost markdown file (read more). I would convert it to html with something like pandoc (see here, and note that pandoc supports some extended markdown features, and is very capable (even can generate epub too, but I did not tested that yet)).

That way your headers (the underlined lines) would be translated to <h*> tags.

Then you can use for example Calibre's ebook-convert CLI tool (or the gui) to convert it to mobi or epub, and specify the chapter breaks (thanks to the developers, Calibre has a really good documentation). (And I just noticed, that Calibre/ebook-convert can convert markdown directly to epub/mobi!)

Like:

ebook-convert input.html output.epub --chapter 'YOUR XPATH TO DETECT CHAPTERS' --chapter-mark pagebreak

score 0 · Answer 2 · answered Jul 02 '17 at 09:23

This is easy. Your file is already like markdown; all you need to do is change extension to .md (but you do not have to - see below).

So the first heading underlined is level one and the next subsequent headings with the same type of underline will also be level one. When you start using a different line for the first time, that will be level two, etc.

I personally prefer to start heading 1 with # heading 2 with ## etc.

generating .epub takes a second with pandoc; a sample command is below:

pandoc myTextFile.md --latex-engine=xelatex -o myEpubFile.epub

I use xelatex engine because of some unicode characters expected in text, but if it is plain English(ASCII) - you don't need to. Just like that you can also produce .PDF and .docx formats in seconds.

If you would like to keep your input file extension .txt that is not a problem; just specify --from markdown on command line and the input will be read as markdown regardless of what the file extension is. Of course - a binary format like .docx can't be read that way, but then yo will get an error message.

What I like about this method is that it is lightning quick, adjustable and does not require me to open calibre.

Pandoc defaults also start new chapter with each level 1 heading. You can adjust it via parameter --epub-chapter-level.

Automatically create chapter breaks and titles from plain text file?

2 Answers2