0

I'm relatively new, and I'm just at a loss as to where to start. I don't expect detailed step-by-step responses (though, of course, those are more than welcome), but any nudges in the right direction would be greatly appreciated.

I want to use the Gutenberg python library to select a text based on a user's input.

Right now I have the code:

from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers

text = strip_headers(load_etext(11)).strip()

where the number represents the text (in this case 11 = Alice in Wonderland).

Then I have a bunch of code about what to do with the text, but I don't think that's relevant here. (If it is let me know and I can add it).

Basically, instead of just selecting a text, I want to let the user do that. I want to ask the user for their choice of author, and if Project Gutenberg (PG) has pieces by that author, have them then select from the list of book titles (if PG doesn't have anything by that author, return some response along the lines of "sorry, don't have anything by $author_name, pick someone else." And then once the user has decided on a book, have the number corresponding to that book be entered into the code.

I just have no idea where to start in this process. I know how to handle user input, but I don't know how to take that input and search for something online using it.

Ideally, I'd be able to handle things like spelling mistakes too, but that may be down the line.

I really appreciate any help anyone has the time to give. Thanks!

Samuel Liew
  • 76,741
  • 107
  • 159
  • 260
Will
  • 351
  • 4
  • 15

1 Answers1

1

The gutenberg module includes facilities for searching for a text by metadata, such as author. The example from the docs is:

from gutenberg.query import get_etexts
from gutenberg.query import get_metadata

print(get_metadata('title', 2701))  # prints frozenset([u'Moby Dick; Or, The Whale'])
print(get_metadata('author', 2701)) # prints frozenset([u'Melville, Hermann'])

print(get_etexts('title', 'Moby Dick; Or, The Whale'))  # prints frozenset([2701, ...])
print(get_etexts('author', 'Melville, Hermann'))        # prints frozenset([2701, ...])

It sounds as if you already know how to read a value from the user into a variable, and replacing the literal author in the above would be as simple as doing something like:

author_name = my_get_input_from_user_function()
texts = get_etexts('author', author_name)

Note the following note from the same section:

Before you use one of the gutenberg.query functions you must populate the local metadata cache. This one-off process will take quite a while to complete (18 hours on my machine) but once it is done, any subsequent calls to get_etexts or get_metadata will be very fast. If you fail to populate the cache, the calls will raise an exception.

With that in mind, I haven't tried the code I've presented in this answer because I'm still waiting for my local cache to populate.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • Thank you!! (I don't totally understand what "populating the local metadata cache" is actually referring to. Is it an automatic process that will happen the first time you search for a text by author - or any other metadata? Or is there some process you have to go through in the code itself first?) – Will Oct 15 '18 at 02:53
  • follow-up: I tried the code and got back this error: InvalidCacheException: The cache is invalid or not created. So it seems like there is something I need to do prior to running the code... – Will Oct 15 '18 at 02:57
  • update #2: oh, I should have just checked the link, whoops. Still tho, after trying to code they have to populate the cache, i get the error "OperationalError: (sqlite3.OperationalError) unable to open database file" any idea of what to do? – Will Oct 15 '18 at 03:02
  • Okay, I got it working. How much space is this gonna take up on my computer tho...haha – Will Oct 15 '18 at 03:44
  • I'm worried I broke my computer lol, I made another question, do you have any advice: [https://stackoverflow.com/questions/52822763/confused-about-populating-a-cache] – Will Oct 15 '18 at 19:35