0

I want to iterate over a list of files in Python 3. They are CSV files containing matrices. I want to do the same with all of them so I wanted to create a list with their names, remove all other files in the folder from the list and do my transformation with the relevants.

My target files all end with "2m.csv" (e.g.: 14-17_CCK_all_2m.csv) and my results at the end of the process will end with "1m.csv". Still when I run the following script in Jupyter notebook, the result contains some files ending with "1m.csv" (they remained there from an earlier cycle of the development)

import os
myfiles = os.listdir()

for item in myfiles:
    if item[-6:] != "2m.csv":
        myfiles.remove(item)

Interestingly, if I test one of the false negatives in a separate lines, I get a True answer, so the if statement should have eliminated it from my list in the script above - which it did with some of them, but not with some others:

myfiles[1][-6:] != "2m.csv"
>>> True

All the files in question have very similar name structure. Thanks for your help.

Béla
  • 37
  • 5
  • 1
    Unsure this is the cause, but in other languages, modifying a list that you're iterating over is either disallowed or can produce inaccurate results. You could try copying the items to a second list, and iterate over one of them while modifying the other. – Adam V Jun 27 '18 at 16:40
  • Have you looked at the exact output of one of these false positives? Can you post a filename where the if check consistently behaves unexpectedly? – LTClipp Jun 27 '18 at 16:42
  • modifying a collection/list -- iterating it is always likely to produce this sort of discrepancy. If you add a `print` statement before your `if` statement, you'll probably see that `"2m.csv"` doesn't print. THe reason being that as you're removing from the list, the list is re-indexed, and the iteration effectively skips over the item. – David Zemens Jun 27 '18 at 16:45
  • `myfiles = [item for item in myfiles if item[-6:] == '2m.csv']` is what you need. – David Zemens Jun 27 '18 at 16:47

4 Answers4

0

Better use list comprehensions:

myfiles = [x for x in os.listdir() if x[-6:] == '2m.csv']

And I prefer to use endswith() method, not slices:

myfiles = [x for x in os.listdir() if x.endswith('2m.csv')]
Konstantin
  • 547
  • 4
  • 11
  • the first command in your answer has a bug. Should be "==", otherwise it will be a collection of not 2m.csv files ;) – rth Jun 27 '18 at 16:51
0

The problem seems in your for loop. You are iterating through and modify myfiles at the same time.

The solution is to filter out wrong file names inline.

import os
myfiles = [ item for item in os.listdir() if item[-6:] == "2m.cvs" ]
rth
  • 2,946
  • 1
  • 22
  • 27
0

To filter the list in Python, like you would want, don't use for loop to iterate over it. It's better to use list comprehensions

So it would look like this:

import os
myfiles = [f for f in os.listdir() if f[-6:] == "2m.csv"]

It's more clean, its usually faster on benchmarks, and it does the job you want it to do (and it's also a lot cleaner than map/filter - but that's my subjective opinion)

minecraftplayer1234
  • 2,127
  • 4
  • 27
  • 57
0

modifying a collection/list -- iterating it is always likely to produce this sort of discrepancy. If you add a print statement before your if statement, you'll probably see that "2m.csv" doesn't print. THe reason being that as you're removing from the list, the list is re-indexed, and the iteration effectively skips over the item.

The solution as given in the linked duplicate is to use list comprehension:

myfiles = [item for item in myfiles if item[-6:] == "2m.csv"]

Alternatively, if you prefer to use a for loop, you need to iterate backwards, so that the removal of items (and subsequent re-indexing) doesn't affect the remaining items.

for i in range(len(myfiles)-1,-1,-1):
    if myfiles[i][-6:] != "2m.csv":
        myfiles.remove(i)

But the list comprehension method would be more concise and more pythonic.

David Zemens
  • 53,033
  • 11
  • 81
  • 130