0

I am examining phishing emails from a .mbox file and am trying to extract the links. I am using the python BeautifulSoup module to extract all tags and am finding it is producing weird results. When I manually examine the file I can see that the first tag is this...

<a href=3D"http://pages.ebay.com/"><img
src=3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=
if"
alt=3D"From collectibles to cars, buy and sell all kinds of items on eBay"=

border=3D"0"></a>

Please do not follow the link it is a phishing email!

When I use the BeautifulSoup.find_all('a') function, the first result is this...

<a href='3D"http://pages.ebay.com/"'><img all="" alt='3D"From' and="" buy="" cars,="" collectibles="" ebay"='border=3D"0"' if"="" items="" kinds="" of="" on="" sell="" src='3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=' to=""/></a>

Can anyone help me understand why I get this result and how I can get the actual tag from the email without using python re module?

Asyu7
  • 11
  • 4
  • 1
    [What's a 3D doing in this HTML?](https://stackoverflow.com/questions/4016067/whats-a-3d-doing-in-this-html) - Maybe the encoding is messing with BeautifulSoup? Please make the phishing link warning more bold/visible. – shriakhilc Dec 29 '21 at 06:19
  • Is that not the tag you are looking for? You didn't provide counterexamples. You wanted `` tags, you got `` tags. – weasel Dec 29 '21 at 06:49
  • What do you mean by "the actual tag"? Do you mean the `href` attribute from the anchor element? – Grismar Dec 29 '21 at 06:52

0 Answers0