0

I want to create a function that reads a csv file line by line and loads those lines that meet two different regex conditions. The first condition is loading those lines that include any roman number: IVXLCDM After that condition is met, I need to filter out the ones that include the following pattern: .od.s

So if I have a csv file like this:

547 I. Line 1 
479 II. Todos Line 2
897 Line 3
879 XI. Line 4

It should only load these lines:

547 I. Line 1 
879 XI. Line

So far I have this:

def load_file(file_extension):
    import re
    file = open(file_extension,'r')
    filter1 = re.compile("\d{3}\s+.([.IVXLCDM.]+)")
    filter2 = re.compile(".od.s")
    final_list = []
    for line in file:
        if re.search(filter1,line):
           if not re.search(filter2,line):
              final_list.append(line)
        return(final_list)
    file.close()
   

print(load_file('file.csv'))

But it keeps returning an empty list.

I am not sure if this can be done in a single function. I also tried creating two different functions: One that filters a list with both regex conditions, and another one that calls the first function when it reads a csv file. But it also didn't work.

Daniela D
  • 47
  • 5
  • The code creates an empty list `final_list`, then enters the loop `for line in file`, where in the first iteration it possibly adds one value to `final_list`, and then it returns the list, which has either 0 or 1 item at this point. That's it. – mkrieger1 Jan 08 '21 at 01:42
  • Yes, it does! Thank you! – Daniela D Jan 08 '21 at 13:27

1 Answers1

0

Your return statement is returning after one run of your for loop, so the function ends after one run of the loop. Make sure the return is outside your for loop. You should also put file.close() before the return statement. Remember, nothing after a return statement is executed.

goalie1998
  • 1,427
  • 1
  • 9
  • 16