I am examining phishing emails from a .mbox file and am trying to extract the links. I am using the python BeautifulSoup module to extract all tags and am finding it is producing weird results. When I manually examine the file I can see that the first tag is this...
<a href=3D"http://pages.ebay.com/"><img
src=3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=
if"
alt=3D"From collectibles to cars, buy and sell all kinds of items on eBay"=
border=3D"0"></a>
Please do not follow the link it is a phishing email!
When I use the BeautifulSoup.find_all('a') function, the first result is this...
<a href='3D"http://pages.ebay.com/"'><img all="" alt='3D"From' and="" buy="" cars,="" collectibles="" ebay"='border=3D"0"' if"="" items="" kinds="" of="" on="" sell="" src='3D"http://pics.ebaystatic.com/aw/pics/register/HeaderRegister_387x40.g=' to=""/></a>
Can anyone help me understand why I get this result and how I can get the actual tag from the email without using python re module?