regex using python language

Question

I have a txt file with various email addresses and other lines that are not valid emails, I am trying to print only the valid email addresses, when I use the code below, nothing is printed. This is the content of the txt file:

blbabal@gmail.com   
hey@gmail.com

lalalalal

In this case, only both the email addresses should be printed

 import re

    my_file = open('emails.txt', 'r+')

Add `re.M` flag, `re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M)` — Wiktor Stribiżew, Mar 05 '19 at 11:58
It is very similar to this question: https://stackoverflow.com/q/6186938/4636715 except you specifically look for email addresses. But as your point is not the regex you've built, it can be considered as a dupe. — vahdet, Mar 05 '19 at 12:02
@vahdet It is not similar to that question. Here, the whole line must match a pattern. — Wiktor Stribiżew, Mar 05 '19 at 12:03
nothing is printed because the `for` loop is iterating over the file, which has already seeked to the end with `.read()`. why aren't you iterating over `items` instead? — , Mar 05 '19 at 12:06
You are looking for matches, storing them in `items` and in the very next line your are overwriting `items`. — Klaus D., Mar 05 '19 at 12:08

score 1 · Answer 1 · answered Mar 05 '19 at 12:06

You may fix your code if you add re.M flag:

re.findall(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$", my_file.read(), re.M)

Since you read in the whole file with my_file.read(), the ^ and $ should match start/end of the line, not string, and the re.M flag does that.

Also, you may read the file line by line and only get those lines that fully match your pattern:

items = []
email_rx = re.compile(r"[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9.-]+$")
with open('emails.txt', 'r+') as my_file:
    for line in my_file:
        if email_rx.match(line):
            items.append(line)

Note that only $ anchor is necessary as re.match only anchors matches at the start of the string.

Note that you may have CRLF endings, then, you might either rstrip each line before testing against regex and appending to items, or add \s* pattern at the end before $ anchor.

score 0 · Answer 2 · answered Mar 05 '19 at 12:11

0

import re
my_file = open('emails.txt', 'r+')
items = re.findall(r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", my_file.read())
for items in items:
    print(items)

Two problems

for item in items instead of file
remove ^ and $ from your pattern.

answered Mar 05 '19 at 12:11

thavan

2,409
24
32

If you remove the anchors, the email like substrings that do not equal the whole line will get extracted, too. OP used the anchors for a reason. – Wiktor Stribiżew Mar 05 '19 at 13:11

Shahir Ansari · Answer 3 · 2019-03-06T05:33:00.273

0

This should print all emails in the file

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))!=0):
            print(email)

And this should get only whole line emails -

import re
reg = '[A-Za-z0-9.]+@[A-Za-z0-9]+[.][a-z]+'
with open('email.txt', 'r') as f1:
    for email in f1:
        if(len(re.findall(reg,email))):
            if(len(re.findall(reg,email)[0])==len((email.replace("\n","")))):
                print(email)

edited Mar 06 '19 at 05:33

answered Mar 05 '19 at 12:40

Shahir Ansari

1,682
15
21

OP only wants those emails that are equal to whole lines. – Wiktor Stribiżew Mar 05 '19 at 13:11
Check the second part of code that willget the lines with only whole email in it. – Shahir Ansari Mar 06 '19 at 05:33
There is a more straight-forward approach, see my answer. – Wiktor Stribiżew Mar 06 '19 at 08:24

regex using python language

3 Answers3