Is it possible to read pdf which is opened online and read the text contents of it and store it in a file?

Question

I am looking to read a pdf and extract the text from it. The pdf is present in one of the url's and I don't wish to download it. I wish to read it on-the-go from the internet. Is this even possible?

I tried using 'Tika' but it doesn't really work. It gave me error:

2019-08-29 15:39:15,416 [MainThread ] [WARNI] Tika server returned status: 500 {'status': 500}

from tika import parser
URL_path = "http://www.---path to .pdf"    
raw = parser.from_file(URL_path)
print(raw)

So you want to read the content of a pdf file from online link? And for that you are writing a python script ? — shuberman, Aug 29 '19 at 10:13
The "500" suggests that there is a problem with your request. Is the URL definitely correct? The server expects a GET request? If you copy the exact URL from your code and paste it in an incognito window in your browser, does it work? — saintamh, Aug 29 '19 at 10:16
@saintamh , Thanks for your reply. Yes, the url is absolutely right. I tried opening it in incognito mode and it opens a PDF — developer, Aug 29 '19 at 10:17
@mishsx Thanks for the reply, Yes, trying to write a script that would read an online PDF and extract text from it — developer, Aug 29 '19 at 10:18
I mean but why tho? Unless you are doing this for a bulk amount of URL, it always makes sense to download it and scan it using an OCR library — shuberman, Aug 29 '19 at 10:20
@mishsx, you are right. I am doing it for more than 100 URLS. — developer, Aug 29 '19 at 10:24
Possible duplicate of [How can i grab pdf links from website with Python script](https://stackoverflow.com/questions/6222911/how-can-i-grab-pdf-links-from-website-with-python-script) — shuberman, Aug 29 '19 at 10:29
@mishsx, See , i don't need the PDF links. I need the pdf data which can be extracted to a text file. — developer, Aug 29 '19 at 10:30
Well, thats one part of the problem but the other part is already answered here -->https://stackoverflow.com/a/45480440/7841468 — shuberman, Aug 29 '19 at 10:31

Is it possible to read pdf which is opened online and read the text contents of it and store it in a file?

0 Answers0