21

I want to develop an eBook reader app. What are some good libraries available to parse formats like .azw, .mobi, .pdf etc.?

Community
  • 1
  • 1
Richard
  • 14,427
  • 9
  • 57
  • 85
  • Some questions that might be helpful for you. [This](http://stackoverflow.com/questions/4665957/pdf-parsing-library-for-android) and [this](http://stackoverflow.com/questions/4773576/are-there-any-free-pdf-parsing-libraries-that-work-in-android). – Ranhiru Jude Cooray Jan 20 '12 at 11:42
  • well yes, but thats just for pdfs. epub is just a zipped html file so that could be solved aswell but the other ones? – Richard Jan 20 '12 at 12:06
  • While finding a library might be the easiest solution (no judgement, I'd look for one too!), if you can't find one, investigate what these files actually are. At some point, they are either text or images. Find out what distinguishes one format from another. For instance, Richard says that epubs are zipped html. So, unzip it, and parse the html in your app. Surely you can find an html parsing library. Looks like it's going to be more work than you were hoping for, but it would be a good exercise. And hey, if you code it well, you could make an ebook library for others to use :) – Cody S Jan 22 '12 at 21:29

2 Answers2

11

As Ranhiru said, here and here you can see how PDFs are parsed. For .mobi, however, there is no library, so you'll have to parse the format yourself. A full specification of the format can be read on the mobileread wiki.

With .azw files, it's different: if the Kindle ebook is DRM-free, then its format coincides with the .mobi one, i.e. they are absolutely interchangeable. Otherwise, it's very difficult to do, since you'll also have to generate a Kindle PID and perform the de-DRM-ing of the .azw file. There's a guide on how to do that on the desktop here. However, it is strongly not recommended, since it breaks the whole point of DRM and is illegal pretty much everywhere.

Community
  • 1
  • 1
Ivan Zarea
  • 2,174
  • 15
  • 13
1

For mobi there isn't complete spec sheet available, but you should directly jump into PDB format which is extended & used by MOBI

http://jola.comm.pl/palm/opispdb.htm

duckduckgo
  • 1,280
  • 1
  • 18
  • 32