31

My application needs to retrieve information about any published book based on a provided ISBN, title, or author. This is hardly a unique requirement---sites like Amazon.com, Chegg.com, and even software like Book Collector seem to be able to do this easily. But I have not been able to replicate it.

To clarify, I do not need to search the entire database of books---only a limited subset which have been inputted, as in a book collection. The database would simply allow me to tag the inputted books with the necessary metadata to enable search on that subset of books. So scale is not the issue here---getting the metadata is.

The options I have tried are:

  1. Scrape Amazon. Scraping the regular Amazon pages was not very robust to things like missing authors, and while scraping the smaller mobile pages was faster, they shared the same issues with robustness of extraction. Plus, building this into an application is a clear violation of Amazon's Terms of Service.
  2. Scrape the Library of Congress. While this seems to have fewer legal ramifications, ease and robustness were again issues.
  3. ISBNdb.com API. While the service is free up to a point, and does a good job of returning the necessary metadata, I need to do this for over 500 books on a daily basis, at which point this service costs money proportional to use. I'd prefer a free or one-time payment solution that allows me to do the same.
  4. Google Book Data API. While this seems to provide the information I need, I cannot display the book preview as their terms of service requires.
  5. Buy a license to a database of books. For example, companies like Ingram or Baker & Taylor provide these catalogs to retailers and libraries. This solution is obviously expensive, so I'm hoping that there's a more elegant solution I've missed. But if not, and someone on SO has had a good experience with a particular database, I'm willing to go with that.

I've tried to describe my approach in detail so others with fewer books can take advantage of the above solutions. But given my requirements, I'm at my wits' end for retrieving book metadata.

starball
  • 20,030
  • 7
  • 43
  • 238
Saketh
  • 311
  • 1
  • 4
  • 3

5 Answers5

5

Since it is unlikely that you have to retrieve the same 500 books every day: store the data retrieved from isbndb.com in a database and fill it up book by book.

akira
  • 6,050
  • 29
  • 37
  • 1
    I'd like to do this, but the limit of 500 books per day is a significant constraint whenever I load large (~30,000) inventories into the database. It would be ideal to either hack together an API for or purchase access to an existing database which I could then use without limits on the number of lookups. – Saketh Jul 20 '10 at 07:48
  • with that high number of items it seems that you are going the professional route. i doubt that any service will let you basically clone their databases without paying them (serious) money. – akira Jul 20 '10 at 08:09
  • The issue is that the inputting is staggered (e.g. 10,000 books at once, then none for some time), but the inputting must be done at once. – Saketh Jul 20 '10 at 08:59
4

This might be what you're looking for. They even offer a complete download! https://openlibrary.org/data

userSteve
  • 1,554
  • 1
  • 22
  • 34
4

Instead of scraping Amazon, you can use the API they expose for their affiliate program: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html

It allows about 3k requests per hour and returns well-formed XML. It requires you to set a link to the book that you show the information about, and you must state that you are an affiliate partner.

moritz
  • 12,710
  • 1
  • 41
  • 63
  • Requironments to access API are - Have completed 3 qualifying sales in 180 days. - Have an approved associate account. -Comply with the associates program Operating Agreement. – Roman Toasov Oct 25 '21 at 16:46
2

As it seems, a lot of libraries and other organisations make information such as "ISBN" available through MAchine-Readable Cataloging aka MARC, you can find more information about it here as well.

Now knowing the "right" term to search for I discovered WorldCat.org.

Maybe this whole MARC thing gives you a new kind of an idea :)

akira
  • 6,050
  • 29
  • 37
  • There are no reasonable open or paid but easy-to-use ways of resolving the issue using MARC records, as sites like WorldCat generally require that one is a library in order to access their search API. I've been surprised, because one would think that a public catalog of books would be easy to find! – Saketh Jul 20 '10 at 09:02
  • so you can't use the search api (http://worldcat.org/devnet/wiki/SearchAPIDetails) ? – akira Jul 20 '10 at 14:10
  • The WorldCat API uses an access key -- I have requested one, but if I could find an independent solution that would be great. – Saketh Jul 22 '10 at 02:51
  • I think the only way you can get access to the worldcat API is if you are a library. – Alioo Sep 19 '13 at 20:12
  • WorldCat API is free for developer sandbox via OCLC, http://www.oclc.org/developer/develop/web-services/worldcat-search-api.en.html – Yeo Mar 14 '16 at 22:48
0

OCLC's Api. But you need to get an auth key somehow, idk how.

Or just scrape the worldcat.org page with corresponding OCLC identifier number (e.g. oclc number of 1180263022, you'd scrape the page 'https://www.worldcat.org/title/1180263022').

8c6b5df0d16ade6c
  • 1,910
  • 15
  • 15