Extracting links from website using Python, NOT IN HTML

Question

I need to gather PDF-files from this page: http://www.anp.gov.br/?id=532.

I wonder how this is possible in Python when I cant find the links in the HTML source code. Before I have found the links to such files by using Beautifulsoup and pandas.

Thanks for all kind of answers!

Can you explain why you can't find the links in the HTML source code? I'm not sure I'm clear on the goal here. — Alex W, Jul 07 '15 at 17:15
Hi, Alex W! The developers that made the page have not written the links directly in the HTML source code, but are called when clicked. I want these links to collect all the data, and merge them into one excel sheet. Thanks for the respond btw! — Mathias Lia Carlsen, Jul 07 '15 at 17:18

score 4 · Accepted Answer · edited May 23 '17 at 11:51

4

It looks like all of the pdf links are in <a> tags so you can use BeautifulSoup to grab those links. If you need further advice I recommend you reference this discussion to see how to accomplish that task.

enter image description here

edited May 23 '17 at 11:51

Community

1
1

answered Jul 07 '15 at 17:20

gffbss

1,621
1
17
19

The problem is just that the links is not in tags. – Mathias Lia Carlsen Jul 07 '15 at 17:25
Check the image I uploaded. I can see the links to the files, hopefully you can as well! If so, you can reference the discussion I linked to in order to get the url from the `href` in the `` tag. – gffbss Jul 07 '15 at 17:27
Thanks a lot! Found it now! – Mathias Lia Carlsen Jul 07 '15 at 17:32
No problem, happy to help :) – gffbss Jul 07 '15 at 17:39

Extracting links from website using Python, NOT IN HTML

1 Answers1