0

Sometimes project Gutenberg includes the author or book name in a machine readable way in the raw text files but many times it doesn't. I have a collection of project Gutenberg raw text files that I would like to use and quote from using software (normally python3 or shell) but I would like to get the author and book name to go with it for future reference. Would nltk be able to do it?

Ohiovr
  • 977
  • 1
  • 12
  • 22
  • The book page has that meta data : https://www.gutenberg.org/ebooks/100 – JonSG Feb 15 '22 at 17:40
  • What do these files look like? – Jan Wilamowski Feb 16 '22 at 05:35
  • The files are plain text with Project Gutenberg's addresses but things like title and author are displayed inconsistantly and are hard to parse. I couldn't use JonSG's suggestion for a scraper as I would get IP banned. However I found a much better way and that is to simply hyperlink to the file I am using on Gutenberg's site. I wish I had a slicker way but ah, this works. – Ohiovr Mar 06 '22 at 23:03

0 Answers0