select files from path

Question

I have files in particular path and need to select one by one base on namefile (yyyymmdd.faifb1p16m2.nc) where yyyy is year, mm is month, and dd is date. I made code like this :

results=[]
base_dir = 'C:/DATA2013'
os.chdir(base_dir) 
files = os.listdir('C:/DATA2013')
for f in files:
    results += [each for each in os.listdir('C:/DATA2013')
    if each.endswith('.faifb1p16m2.nc')]

What should I do next if I only select files for January, and then February, and so on. Thank you.

Use glob: http://docs.python.org/2/library/glob.html – alvonellos Mar 11 '14 at 06:40 — alvonellos, Mar 11 '14 at 06:40
See answer I made to this post – alvonellos Mar 11 '14 at 07:04 — alvonellos, Mar 11 '14 at 07:04

score 1 · Answer 1 · answered Mar 11 '14 at 06:33

1

You can do :

x = [i for i in results if i[4:6] == '01']

It will list all file names for January. Assuming that your all files of same format as you have described in the question.

answered Mar 11 '14 at 06:33

Omsai Jadhav

134
10

score 0 · Answer 2 · answered Mar 11 '14 at 07:30

To validate filenames, you could use datetime.strptime() method:

#!/usr/bin/env python
import os
from datetime import datetime
from glob import glob

suffix = '.faifb1p16m2.nc'

def parse_date(path):
    try:
        return datetime.strptime(os.path.basename(path), '%Y%m%d' + suffix)
    except ValueError:
        return None # failed to parse


paths_by_month = [[] for _ in range(12 + 1)]
for path in glob(r'C:\DATA2013\*' + suffix): # for each nc-file in the directory
    date = parse_date(path)
    paths_by_month[date and date.month or 0].append(path)

print(paths_by_month[2]) # February paths
print(paths_by_month[0]) # paths with unrecognized date

score 0 · Answer 3 · edited Mar 11 '14 at 07:50

0

try this:

from os import *
results = []
base_dir = 'C://local'
chdir(base_dir)
files = listdir(base_dir)
for f in files:
    if '.faifb1p16m2.nc' in f and f[4:6] == '01': #describe the month in this string
        print f

edited Mar 11 '14 at 07:50

alvonellos

1,009
1
9
27

answered Mar 11 '14 at 07:31

nakkulable

38
4

@user3346361, be sure to upvote the answer and mark it as correct. – alvonellos Mar 11 '14 at 08:09

score 0 · Accepted Answer · edited May 23 '17 at 10:33

Two regexes:

\d{4}(?:\d?|\d{2})(?:\d?|\d{2})\.faifb1p16m2\.nc
\d{8}\.faifb1p16m2\.nc

Sample data:

20140131.faifb1p16m2.nc
2014131.faifb1p16m2.nc
201412.faifb1p16m2.nc
201411.faifb1p16m2.nc
20141212.faifb1p16m2.nc
2014121.faifb1p16m2.nc
201411.faifb1p16m2.nc

The first regex will match all 7 of those entries. The second regex will match only 1, and 5. I probably made the regexes way more complicated than I needed to.

You're going to want the second regex, but I'm just listing the first as an example.

from glob import glob
import re

re1 = r'\d{4}(?:\d?|\d{2})(?:\d?|\d{2})\.faifb1p16m2\.nc'
re2 = r'\d{8}\.faifb1p16m2\.nc'

l = [f for f in glob('*.faifb1p16m2.nc') if re.search(re1, f)]
m = [f for f in glob('*.faifb1p16m2.nc') if re.search(re2, f)]

print l
print
print m
#Then, suppose you want to filter and select everything with '12' in the list m
print filter(lambda x: x[4:6] == '12', m)

As another similar solution shows you can ditch glob for os.listdir(), so:

l = [f for f in glob('*.faifb1p16m2.nc') if re.search(re1, f)]`

Becomes:

l = [f for f in os.listdir() if re.search(re1, f)]

And then the rest of the code is great. One of the great things about using glob is that you can use iglob which is just like glob, but as an iterator, which can help with performance when going through a directory with lots of files.

One more thing, here's another stackoverflow post with an overview of python's infamous lambda feature. It's often used for the functions map, reduce, filter, and so on.

Thank you very much. Now it works, but I still have problem. I will put it in next question — user3346361, Mar 11 '14 at 08:09
@user3346361, be sure to upvote the answers and mark them as correct. What's your other problem? — alvonellos, Mar 11 '14 at 08:10

select files from path

4 Answers4