I just found out about Schema.org. I would like to use it in my webpages. I think I have gathered a very basic and confusing idea of Schema.org so far, but unfortunately right now I can’t afford the time to dive deep into it and learn more to be able to use it properly and immediately in the pages I am building right now.
So, here is my problem:
I converted a huge 670 page book (with quite a bit of pics in addition to text) into HTML5 pages. The book is a PDF file. I broke it down to 23 chunks and then converted those chunks into equal number of HTML5 files - using a free/trial converter (converting PDF to HTML5+SVG). These HTML5 files don’t have any visible dependencies / external assets like normal HTML pages have (especially with embedded images, JS, CSS, etc.). Also, on top of the images from the original PDF file, the original text of the PDF has also been converted into “svg” image format instead of text - and embedded or encoded into the HTML files, I think. But I don’t see any external dependent files, they seem to be self-contained with lots of code only. In other words, the entire content of the book seems to be there inside those HTML files only. I am not familiar with such HTML files and not sure if this is possible or whether I am missing something here due to my lack of knowledge.
Anyway, now inside the source codes of those HTML files, I would like to tell the search engines (and other concerned parties, if any) in a Google-friendly manner as far as possible, using Microdata or JSON-LD, that —
This file (the individual HTML5 file chunks) is a part or chunk (not necessarily a ‘chapter’) of (isPartOf? PublicationIssue?) a “Book” or “EBook” (of the same book or ebook). There are other similar files here too, and together they make the entire book.
The main content of the book (therefore of the individual HTML files) is mostly in image format, probably SVG+XML. -- bookFormat / BookFormatType / ImageObject/ associatedMedia / MediaObject / encoding / encodesCreativeWork / encodingFormat? (Although, my understanding was that the converter is supposed to add an extracted text file or just extracted text to facilitate search, but I can’t find that.)
Add: numberOfPages of the entire book (not of the individual chunks or html files), about, sameAs (for main site), description.
My problem is, I am not sure (based on my present knowledge) which Schema.org types and properties to choose for my context as described above, how to correctly and concisely write it with correct/valid syntax, and where to place it inside the source code of the HTML files. The content of the files looks to me all jumbled and almost undecipherable codes sprinkled with a bit of original text very sparsely here and there. It looks to me like all fonts, texts and images of the original are encoded in the same place here. Which are almost undistinguishable to me. So, my idea is to start in the body
tag with Microdata and encapsulate everything else inside one or two div
or span
s. No need to identify items separately.
That’s it! Can anybody help?
UPDATE BASED ON UNOR'S REPLY
Here is the code I think I will settle on (some questions remain):
To be placed in the Table of Contents (with the title of the book as header) page of the book/ebook - which will be the entry page too:--
<script type="application/ld+json"> { "@context": "http://schema.org/", "@id": "http://example.com/Archaeological_Heritage_Of_India.html#book", "@type": "Book", "name": "Archaeological Heritage of India", "bookFormat": {"@id": "http://schema.org/EBook"}, "inLanguage": "en", "genre": "Archaeological Heritage" **/* OR "genre": "http://vocab.getty.edu/aat/300054328" */** } </script>
To be placed in rest of the pages of the book (ie separate individual html files) :
<script type="application/ld+json"> { "@context": "http://schema.org/", "isPartOf": "http://example.com/Archaeological_Heritage_Of_India.html#book" } </script>
What I would like to know if this is completely correct?
Also, how can I and should I incorporate contentLocation
in this (in no.1) - to indicate the geographical limit or focus of the main content of the book? How about like the following:
"contentLocation": "India" /* OR - the ISO 3166-1 alpha-2 country code: "IN" ?