I'm trying to open and process through an mht files and scrape off the dealer location data. Whenever I run into a website with 'tricky' format of the html I always keep running into same problem. It turns:
a href="http://www.google.com/maps?s=123 main st"......
into
a href="http://www.=
google.com/maps?=12=
3 main st"
Anything I have tried so far hasn't worked to take the line back to it original self. I still can't pull the address off.
a = a.replace(r'=\n', '')
or
a = a.replace(r'\n', '')
or even tried,
a = a.replace(r'[0D]', '')
and just tried,
a = a.sub(r'\n', '')
and all I got was the error 'str object has no attribute 'sub', and it does the same thing with or without the 'r' in the code.
Nothing has worked thus far. How do I replace the =\n that always pops up whenever I go to look at an mht file.
I am using
a = open('Filename.mht', 'r')
b = a.read()
a.close()