1

I am webscraping a long table of html links (allowed under ToS). However, all the links are javascript calls (href="javascript:;") so using get_attribute() to get the link will not work. I don't want to actually click on all the links since it will download a large pdf file for each one

Is it possible to get the ultimate href/link that is called, without actually clicking the link and downloading the file?

Thank you!

Stanford Wong
  • 339
  • 1
  • 3
  • 13

1 Answers1

1

Yes, but not easy - you need to take a look at javascript beyond those links, probably the links are generated dynamically.

The idea of doing this <a href="javascript:;"></a> is described here What does href expression <a href="javascript:;"></a> do?

In short: in HTML for <a> to render correctly you need to set href, but sometimes there is no direct link or it's calculated somehow - so you need to look at javascript code which performs handling of those links - probably it's some click event listener you need to find

davidluckystar
  • 928
  • 5
  • 15