How to read multiple files from different folder in python

Question

I have yearly data files in different folders. each file contains daily data ranging from Jan 1 to Dec 31. Data files name is looks like AS060419.67 where last four digit represent year i.e. 1967 and 0604 is folder name.

I tried to read these multiple files by using the code (below), but it reads only for last year data in last folder

def date_parser(doy, year):    
    return dt.datetime.strptime(doy.zfill(3)+year, '%j%Y')

files = glob.glob('????/AS*')
files.sort()
files
STNS = {}
for f in files:
    stn_id, info = f.split('/')
    year = "".join(info[-5:].split('.'))
    #print (f,stn_id)
    with open(f) as fo:                  
        data = fo.readlines()[:-1]
        data = [d.strip() for d in data]
        data = '\n'.join(data)
        with open('data.dump', 'w') as dump:
            dump.write(data)

parser = lambda date: date_parser(date, year=year)
df = pd.read_table('data.dump', delim_whitespace=True,names=['date','prec'], 
                   na_values='DNA', parse_dates=[0], date_parser=parser, index_col='date' ) 

df.replace({'T': 0})
df = df.apply(pd.to_numeric, args=('coerce',))
df.name = stn_name
df.sid = stn_id

if stn_id not in STNS.keys():
    STNS[stn_name] = df

else:
    STNS[stn_id] = STNS[stn_id].append(df)
    STNS[stn_id].name = df.name
    STNS[stn_id].sid = df.sid
    #outfile.write(line)

For making plot

for stn in STNS:
    STNS[stn_id].plot()
    plt.title('Precipitation for {0}'.format(STNS[stn].name))

The problem is it reads only last year data in last folder. Can anyone help to figure out this problem.Your help will be highly appreciated.

Sounds like you want [os.walk](http://www.tutorialspoint.com/python/os_walk.htm) — willnx, Mar 26 '16 at 09:19
You are overwriting the output data with `open('data.dump', 'w')`. You should probably be opening that file in `'a'` mode.Take a look at the accepted answer to [python open built-in function: difference between modes a, a+, w, w+, and r+?](http://stackoverflow.com/q/1466000/4014959) for info about file modes. — PM 2Ring, Mar 26 '16 at 09:24
@bikuser, can you post a sample of you input files in original format (as text) - 5-7 rows would be enough. I have a feeling that you don't need to loop through your files ... — MaxU - stand with Ukraine, Mar 26 '16 at 09:51
Hi @MaxU , I have attached my sample input file in this link https://drive.google.com/folderview?id=0B2rkXkOkG7ExRTJxTFJNTXdsV0E&usp=sharing — bikuser, Mar 26 '16 at 10:15
@bikuser, i got them now. Do you want to concatenate data from all files into single data frame? — MaxU - stand with Ukraine, Mar 26 '16 at 10:52
@MaxU. yes I want data from all files in to single data frame. — bikuser, Mar 26 '16 at 11:01

MaxU - stand with Ukraine · Accepted Answer · 2016-03-26T12:26:03.733

You can do it like this:

import os
import glob
import pandas as pd
import matplotlib.pyplot as plt

# file mask
fmask = r'./data/????/AS*.??'

# all RegEx replacements
replacements = {
  r'T': 0
}

# list of data files
flist = glob.glob(fmask)


def read_data(flist, date_col='date', **kwargs):
    dfs = []
    for f in flist:
        # parse year from the file name
        y = os.path.basename(f).replace('.', '')[-4:]
        df = pd.read_table(f, **kwargs)
        # replace day of year with a date
        df[date_col] = pd.to_datetime(y + df[date_col].astype(str).str.zfill(3), format='%Y%j')
        dfs.append(df)
    return pd.concat(dfs, ignore_index=True)


df = read_data(flist,
               date_col='date',
               sep=r'\s+',
               header=None,
               names=['date','prec'],
               engine='python',
               skipfooter=1,
              ) \
     .replace(replacements, regex=True) \
     .set_index('date') \
     .apply(pd.to_numeric, args=('coerce',))


df.plot()

plt.show()

I've downloaded only four files, so the corresponding data you can see on the plot...

Hi @MaxU, Thank you very much for your kind support :) – bikuser Mar 26 '16 at 16:10 — bikuser, Mar 26 '16 at 16:10
@bikuser, glad i could help :) – MaxU - stand with Ukraine Mar 26 '16 at 16:12 — MaxU - stand with Ukraine, Mar 26 '16 at 16:12

score 1 · Answer 2 · edited May 23 '17 at 12:16

1

You overwrite the same file over and over again. Derive your target file name from your source file name. Or use the append mode if you want it all in the same file.

How do you append to a file?

edited May 23 '17 at 12:16

Community

1
1

answered Mar 26 '16 at 09:23

Jacques de Hooge

6,750
2
28
45

hi Jacques, how to use append mode in with open function? could you help me please? – bikuser Mar 26 '16 at 09:30

How to read multiple files from different folder in python

2 Answers2