Is there a way to use Python and Selenium to scrape information off online PDF's without downloading?

Asked Jul 16 '20 at 07:38

Active Jul 16 '20 at 07:38

Viewed 285 times

url = 'https://dekaflow.dominionenergy.com/servlet/InfoPostServlet?region=null&company=cpt&method=headers&category=Capacity&subcategory=Operationally+Available'

I am trying to scrape information from this URL above. After clicking on any link such as "JULY Capacity Available 07/16/2020 Timely" etc, it opens a PDF in the same tab.

Is there any way to read information and capture select data without downloading to the directory?

I am using Selenium and Python.

asked Jul 16 '20 at 07:38

Raj

Can you be more specific about what data you are wanting to grab ? Is it data about the link? Is it data within the PDFs ? If it's regarding HTML tags then it's quite easy to grab that data without needing to click the link. – AaronS Jul 16 '20 at 07:42
The data within the PDF. Not any HTML tags, but there is a table within the PDF which contains information I'm interested in. – Raj Jul 16 '20 at 07:49
Have you tried any of the following ? https://stackoverflow.com/questions/47533875/how-to-extract-table-as-text-from-the-pdf-using-python/53050405. I've had a quick go with tabula,camelot and excalibur without success. My gut tells me you'll have to scan the pdf in and extract the text. – AaronS Jul 16 '20 at 08:25
Ok thank you thanks for the efforts – Raj Jul 16 '20 at 08:33

Is there a way to use Python and Selenium to scrape information off online PDF's without downloading?

0 Answers0