I'm using Python (I'm open to other languages, Java or C++, I just arbitrarily chose Python) to web-scrape a pdf from a URL. I don't have a problem using get requests or acquiring the binary data, but in order to convert it to usable text, I'm using PyPDF2. The problem is that I need to read through several hundred files and writing to a pdf file on my disk and then reading it is extremely slow, and the process takes over three minute each time.
I tried to use StringIO, but I had problems installing the module, and it seems like it's really outdated. Ideally I'd like a module that can convert raw binary info from a get request to meaningful text. Does anyone know a module like that?
Edit: The question was closed because a 14 year old article was linked as answering my question, but it did not. tripleee answered my question, and I was successful with using the native Python IO module.