I am trying to remove text between these two delimiters: '<' & '>'. I am reading email content and then writing that content to a .txt file.I get a lot of junk between those two delimiters including whitespace between lines in my .txt file. How do I get rid of this? Below is what my script has been reading from the data written to my .txt file:
First Name</td>
<td bgcolor='white' style='padding:5px
!important;'>Austin</td>
</tr><tr>
<td bgcolor='#f9f9f9' style='padding:5px !important;'
valign='top' width=170>Last Name</td>
Below is my current code for reading from the .txt file which strips empty lines:
# Get file contents
fd = open('emailtext.txt','r')
contents = fd.readlines()
fd.close()
new_contents = []
# Get rid of empty lines
for line in contents:
# Strip whitespace, should leave nothing if empty line was just "\n"
if not line.strip():
continue
# We got something, save it
else:
new_contents.append(line)
for element in new_contents:
print element
Here is what is expected:
First Name Austin
Last Name Jones