So I been trying to format the taken webpage from CL so I can send it to my email,
but this is what I come up with every time I try anything to remove the \n
and \t
b'\n\n\n\t\n\t\n\t\n\t\n\t\n\t\n\n\n\n\t\n\n\n\t
\n\t\t\t
\n\t
\n\t\t
\n\t\t\t
\n 0 favorites\n
\n\n\t\t
\n\t\t
∨
\n\t\t
∧
\n\t\t
\n \n
\n
\n\t \tCL wenatchee all personals casual encounters\n
\n
\n\t\t
\n\t
\n
\n\n\t\t
\n\t\t\t
\n\t\n\t\t\n\t\n\n\n\nReply to: 59nv6-4031116628@pers.craigslist.org\n
\n\n\n\t
\n\t\n\t\tflag [?] :\n\t\t\n\t\t\tmiscategorized\n\t\t\n\t\t\tprohibited\n\t\t\n\t\t\tspam\n\t\t\n\t\t\tbest of\n\t\n
\n\n\t\t
Posted: 2013-08-28, 8:23AM PDT
\n
\n\n
\n \n Well... - w4m - 22 (Wenatchee)\n
I have tried strip, replace and even regex but nothing fazes it, it always comes up in my email unaffected by everything.
Here's the code:
try:
if url.find('http://') == -1:
url = 'http://wenatchee.craigslist.org' + url
html = urlopen(url).read()
html = str(html)
html = re.sub('\s+',' ', html)
print(html)
part2 = MIMEText(html, 'html')
msg.attach(part2)
s = smtplib.SMTP('localhost')
s.sendmail(me, you, msg.as_string())
s.quit()