1

I have a txt file with various email addresses and other lines that are not valid emails, I am trying to print only the valid email addresses, when I use the code below, nothing is printed. This is the content of the txt file:

blbabal@gmail.com   
hey@gmail.com

lalalalal

In this case, only both the email addresses should be printed

 import re

    my_file = open('emails.txt', 'r+')
  • Add `re.M` flag, `re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M)` – Wiktor Stribiżew Mar 05 '19 at 11:58
  • It is very similar to this question: https://stackoverflow.com/q/6186938/4636715 except you specifically look for email addresses. But as your point is not the regex you've built, it can be considered as a dupe. – vahdet Mar 05 '19 at 12:02
  • @vahdet It is not similar to that question. Here, the whole line must match a pattern. – Wiktor Stribiżew Mar 05 '19 at 12:03
  • nothing is printed because the `for` loop is iterating over the file, which has already seeked to the end with `.read()`. why aren't you iterating over `items` instead? –  Mar 05 '19 at 12:06
  • You are looking for matches, storing them in `items` and in the very next line your are overwriting `items`. – Klaus D. Mar 05 '19 at 12:08

3 Answers3

1

You may fix your code if you add re.M flag:

re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M)

Since you read in the whole file with my_file.read(), the ^ and $ should match start/end of the line, not string, and the re.M flag does that.

Also, you may read the file line by line and only get those lines that fully match your pattern:

items = []
email_rx = re.compile(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$")
with open('emails.txt', 'r+') as my_file:
    for line in my_file:
        if email_rx.match(line):
            items.append(line)

Note that only $ anchor is necessary as re.match only anchors matches at the start of the string.

Note that you may have CRLF endings, then, you might either rstrip each line before testing against regex and appending to items, or add \s* pattern at the end before $ anchor.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0
import re
my_file = open('emails.txt', 'r+')
items = re.findall(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", my_file.read())
for items in items:
    print(items)

Two problems

  1. for item in items instead of file
  2. remove ^ and $ from your pattern.
thavan
  • 2,409
  • 24
  • 32
  • If you remove the anchors, the email like substrings that do not equal the whole line will get extracted, too. OP used the anchors for a reason. – Wiktor Stribiżew Mar 05 '19 at 13:11
0

This should print all emails in the file

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))!=0):
            print(email)

And this should get only whole line emails -

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))):
            if(len(re.findall(reg,email)[0])==len((email.replace("\n","")))):
                print(email)
Shahir Ansari
  • 1,682
  • 15
  • 21