I have some lines of code which extracts email addresses from a pdf file.
for page in pdf.pages:
pdf = page.extractText()
# print elpdf
r = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
results = r.findall(pdf)
Listemail.append(results)
print(Listemail[0:])
pdf.stream.close()
Unfortunately, after running the code I have noticed that results are not completely fine as it appears a 'u' character every time a match is found:
[[u'testuser1@training.local']]
[[u'testuser2@training.local']]
Does anybody know haow to avoid that character appearing?
Thanks in advance