I'm not sure I'm asking this question correctly, but I ran into something I've never seen before (FWIW), and since research didn't come up with anything exactly like this, am confused:
Trying to scrape certain links from this page. I go through the usual
r = requests.get(url)
html = r.text
soup = bs4(html, "lxml")
Trying to locate certain links, I do:
exh = soup.find_all('a')
The output contains a couple of the usual format URLs, but many of them have this form (chosen randomly):
exhibit103.htm
On the Firefox page, this entry looks like this:
Note that this entry does not appear clickable, but if you hover over it, it flashes the actual underlying link.
What I consider the relevant part of the html/css for this section looks like this:
<td>
<div>
<a style="-sec-extract:exhibit;"href="exhibit103.htm">
<span>Amendment Two [etc.]
</span>
</a>
</div>
</td>
It looks to my uninformed eyes like an href
inside another href
/nested links. So the general question is - why would anyone bother with this? The more important one (to me) is how do I use BeautifulSoup (or any other method) to extract the actual link?