Strange output from python BeautifulSoup

Question

I am examining phishing emails from a .mbox file and am trying to extract the links. I am using the python BeautifulSoup module to extract all tags and am finding it is producing weird results. When I manually examine the file I can see that the first tag is this...

<a href=3D"http://pages.ebay.com/"><img
src=3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=
if"
alt=3D"From collectibles to cars, buy and sell all kinds of items on eBay"=

border=3D"0"></a>

Please do not follow the link it is a phishing email!

When I use the BeautifulSoup.find_all('a') function, the first result is this...

<a href='3D"http://pages.ebay.com/"'><img all="" alt='3D"From' and="" buy="" cars,="" collectibles="" ebay"='border=3D"0"' if"="" items="" kinds="" of="" on="" sell="" src='3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=' to=""/></a>

Can anyone help me understand why I get this result and how I can get the actual tag from the email without using python re module?

[What's a 3D doing in this HTML?](https://stackoverflow.com/questions/4016067/whats-a-3d-doing-in-this-html) - Maybe the encoding is messing with BeautifulSoup? Please make the phishing link warning more bold/visible. — shriakhilc, Dec 29 '21 at 06:19
Is that not the tag you are looking for? You didn't provide counterexamples. You wanted `` tags, you got `` tags. — weasel, Dec 29 '21 at 06:49
What do you mean by "the actual tag"? Do you mean the `href` attribute from the anchor element? — Grismar, Dec 29 '21 at 06:52

Strange output from python BeautifulSoup

0 Answers0