Appending multiple for-loop outputs to a list

Question

I am using RegEx to extract some data from a txt file. I've made the below for-loops to extract emails and birthdates and (tried) to append the outputs to a list. But when I print my list only the first appended output is printed. The birtdate RegEx works fine when run by itself. I'm sure I'm doing something very basic wrong.

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")

list = []

for i in f:
    if re.findall(r"((?i)[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.])", i):
        list.append(i)

for k in f:
    if re.findall(r'\d\d-\d\d-\d\d\d\d', k):
        list.append(k)

print(list)
f.close()

Not an answer but just noticing that you are using the case-insensitive modifier `(?i)` in your first pattern. So you could get rid of `A-Z`. Also in your second regex > `\d\d\d\d` is better written `\d{4}` — JvdV, Apr 10 '20 at 14:17
Does this answer your question? [Read multiple times lines of the same file Python](https://stackoverflow.com/questions/26294912/read-multiple-times-lines-of-the-same-file-python) — azro, Apr 10 '20 at 14:17
your iterator `f` has reached the end of file (EOF) already when you're entering the second loop. So you either need to do `f.seek(0)` before the second loop, or just `|` two regexes, I think piping two regexes should work just fine — Javed, Apr 10 '20 at 14:18

abhinonymous · Answer 1 · 2020-04-10T14:21:02.543

Try this:

with open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8") as f:
    i = f.readline()
    if re.findall(r"((?i)[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.])", i):
        list.append(i)
    if re.findall(r'\d\d-\d\d-\d\d\d\d', k):
        list.append(i)

in your code, after the first for loop, f is now pointing to the end of the file and so the second for loop doesn't "run" as you're intending it to run.

So to modify your code to get it to do what you intended you would close file after first loop and reopen it before second loop so that the file pointer f points to begining of file again:

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")

list = []

for i in f:
    if re.findall(r"((?i)[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.])", i):
        list.append(i)

f.close()

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")
for k in f:
    if re.findall(r'\d\d-\d\d-\d\d\d\d', k):
        list.append(k)

print(list)
f.close()

Please when answering, explain to the OP it's error, and how do your code can fix it. The main goal of SO is to make people learn stuff, not copy code that just work — azro, Apr 10 '20 at 14:18

Lydia van Dyke · Accepted Answer · 2020-04-10T15:48:01.913

1

You try to read the same file twice. The second for-loop will not do anything. Have a look at this to understand:

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")
print(list(f))
print("second time:")
print(list(f))

Output:

['1234567890abcdefghijklmopqrstuvwxyz'] # or whatever your content is :)
second time:
[]

To fix this you can store the result of the file in a list (if you are not dealing with huge files, of course):

f = open("/Users/me/Desktop/scrape.txt", "r", encoding="utf8")
content = list(f)


for i in content:
   ... 

for k in content:
   ...

In your specific example it would be cleaner (and faster) to do all processing in a single for-loop, though. However, the mistake was to try to read twice from the same file without resetting it.

edited Apr 10 '20 at 15:48

answered Apr 10 '20 at 14:20

Lydia van Dyke

2,466
3
13
25

Note of caution, if the file is large, storing it as a list can result in size of list being HUGE. – abhinonymous Apr 10 '20 at 15:38
True. I just hoped the list of emails and birthdays is not in the order of millions. – Lydia van Dyke Apr 10 '20 at 15:46
@abhinonymous : added a note about this. – Lydia van Dyke Apr 10 '20 at 15:48
Imagine doing that over a wiki dump, I'm sure someone has done that at some point of time :) – abhinonymous Apr 10 '20 at 15:49

Appending multiple for-loop outputs to a list

2 Answers2