1

I am making a framework in order to easily "appify" books. This framework will need to automatically detect chapter and heading to make a table of contents. The idea is to also be able to easily search through the text and find what you are looking for.

Now what I still need to figure out is:

  1. how to store the data in such a way that I can easily detect the chapters and heading
  2. and still be able to search through the text.

The text that is stored needs to be formatted, so I thought I would store them as HTML or Markdown (which will be translated to HTML). I don't think it would be very searchable if the text is in HTML.

P.S. it does not have to be HTML if there are other more efficient ways to format the text.

Cœur
  • 37,241
  • 25
  • 195
  • 267
OmerSakar
  • 175
  • 2
  • 11
  • "I don't think it would be very searchable if the text is in HTML" -- it is as "searchable" as is Markdown, IMHO. My [book](https://commonsware.com/Android)'s APK edition stores a copy of the prose in a SQLite FTS3 database, purely for full-text searching, and it does so using HTML, not the Markdown that I write in. – CommonsWare Apr 23 '17 at 12:00

1 Answers1

1

Do you really want to do such thing on the device itself?

I can suggest you to use separate sqlite database for every book. With separate tables for table of contents, chapters, summarized keywords of chapters(for faster search) and other service info.

Also here you can find full text search example

Also I recommend you to bring your own sqlite build with your app.

Now lets talk about the main problem of yours - the book scraping. I have no competency here, I believe this problem is the same as the web sites scraping.

Upd: Please do not store book contents as HTML, you can store it as markdown for example, it takes less amount of storage, easier to sanitize and you can always apply your styles later

Community
  • 1
  • 1
  • The idea is to have one book per app. I don't know what you mean with "book scraping", but if you mean turning a physical book into HTML or any other form, well that is just a lot of time. I do not know if there is an easier way, but if there is, it should support multiple languages. – OmerSakar Apr 23 '17 at 13:14
  • May be i've got your question wrong. I was talking about reading through already digitalized books with "scraping" algorithm, which creates a graph which represents a book, and then "flattens" that graph into the storage. – Andrew Dementiev Apr 23 '17 at 13:22
  • I have been looking to how to use Markdown in a TextView instead of HTML. But the thing is that all MarkDown are first translated into HTML before it is put into a TextField. – OmerSakar Apr 25 '17 at 11:36