I'm trying to create a epub uploader to iBook in python. I need a python lib to extract book information. Before implementing this by myself I wonder if anyone know a already made python lib that does it.
-
3I am voting to leave this question open, since it seems that at the time of asking, there was no library to implement the required functionality, and I think that the accepted answer contains valuable code. – Gustav Bertram Dec 05 '13 at 09:09
-
The comment is not for you, but for the people voting to close the question. There is no reason to unaccept the answer, particularly as it solved your problem. – Gustav Bertram Dec 10 '13 at 13:42
-
Closing does not mean deleting, the answer is attracting link only answers and maybe spam in future. – bummi May 11 '15 at 05:19
4 Answers
An .epub file is a zip-encoded file containing a META-INF directory, which contains a file named container.xml, which points to another file usually named Content.opf, which indexes all the other files which make up the e-book (summary based on http://www.jedisaber.com/eBooks/tutorial.asp ; full spec at http://www.idpf.org/2007/opf/opf2.0/download/ )
The following Python code will extract the basic meta-information from an .epub file and return it as a dict.
import zipfile
from lxml import etree
def epub_info(fname):
def xpath(element, path):
return element.xpath(
path,
namespaces={
"n": "urn:oasis:names:tc:opendocument:xmlns:container",
"pkg": "http://www.idpf.org/2007/opf",
"dc": "http://purl.org/dc/elements/1.1/",
},
)[0]
# prepare to read from the .epub file
zip_content = zipfile.ZipFile(fname)
# find the contents metafile
cfname = xpath(
etree.fromstring(zip_content.read("META-INF/container.xml")),
"n:rootfiles/n:rootfile/@full-path",
)
# grab the metadata block from the contents metafile
metadata = xpath(
etree.fromstring(zip_content.read(cfname)), "/pkg:package/pkg:metadata"
)
# repackage the data
return {
s: xpath(metadata, f"dc:{s}/text()")
for s in ("title", "language", "creator", "date", "identifier")
}
Sample output:
{
'date': '2009-12-26T17:03:31',
'identifier': '25f96ff0-7004-4bb0-b1f2-d511ca4b2756',
'creator': 'John Grisham',
'language': 'UND',
'title': 'Ford County'
}

- 1
- 1

- 55,315
- 8
- 84
- 99
-
-
Sure enough, epubs are zip files with a different extension. :) – Brōtsyorfuzthrāx Sep 20 '18 at 04:14
-
Something like epub-tools, for example? But that's mostly about writing epub
format (from various possible sources), as is epubtools (similar spelling, different project). For reading it, I'd try the companion project threepress, a Django app for showing epub books on a browser -- haven't looked at that code, but I imagine that in order to show the book it must surely first be able to read it;-).

- 854,459
- 170
- 1,222
- 1,395
-
-
1@xiamx, yes, "mostly about writing" as I said -- so, have you tried the threepress code? – Alex Martelli Jun 27 '10 at 02:08
I wound up here after looking for something similar and was inspired by Mr. Bothwell's code snippet to start my own project. If anyone is interested ... http://epubzilla.odeegan.com/

- 37
- 1
- 5