-1

I have a big text file, let say :

random text Blabla blabla <aaa@gmail.com> bliblibli vlavlavla "bbb@hotmail.com" kakaka lolol <ccc@outlook.su.org> mamama pfdsfsdf random text

And I want to read it and extract all emails from it, and then stock them into an array.

The emails are always contained between <> or between "". And they always contains an @. But they dont always end with a .com. Sometimes they end with many dots (like the example of .su.org).

How can I do that in Python please ?

I tried this :

filepath = 'C://PROGRAMING//Outlook//test1.CSV'
with open(filepath) as file:
    lines = file.readlines()
    print(lines)

It show all the text, but in between I need to add something to take only the informations I want (the emails) and stock them into an array.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153

1 Answers1

-1

You could use RegEx to extract your emails from your files; see this documentation: https://www.w3schools.com/python/python_regex.asp

To find the emails, you can search for every character ( . ) between < or ":

import re

re.search('<|".*>|"', lines)

I'll let you test that solution.

  • Thank you for the RegEx tip ! I found how to do it : '''import numpy as np import re filepath = 'C://PROGRAMING//Outlook//test1.CSV' with open(filepath) as file: lines = file.read() email_ids = re.findall(r'[\w\.-]+@[\w\.-]+', lines) email_ids = list(set(email_ids)) email_ids = np.array(email_ids) np.savetxt('emails1.txt', email_ids, delimiter=' ', comments='', fmt='%s')''' – User012928477 Jan 31 '23 at 15:23