1

I have yearly data files in different folders. each file contains daily data ranging from Jan 1 to Dec 31. Data files name is looks like AS060419.67 where last four digit represent year i.e. 1967 and 0604 is folder name.

I tried to read these multiple files by using the code (below), but it reads only for last year data in last folder

def date_parser(doy, year):    
    return dt.datetime.strptime(doy.zfill(3)+year, '%j%Y')

files = glob.glob('????/AS*')
files.sort()
files
STNS = {}
for f in files:
    stn_id, info = f.split('/')
    year = "".join(info[-5:].split('.'))
    #print (f,stn_id)
    with open(f) as fo:                  
        data = fo.readlines()[:-1]
        data = [d.strip() for d in data]
        data = '\n'.join(data)
        with open('data.dump', 'w') as dump:
            dump.write(data)

parser = lambda date: date_parser(date, year=year)
df = pd.read_table('data.dump', delim_whitespace=True,names=['date','prec'], 
                   na_values='DNA', parse_dates=[0], date_parser=parser, index_col='date' ) 

df.replace({'T': 0})
df = df.apply(pd.to_numeric, args=('coerce',))
df.name = stn_name
df.sid = stn_id

if stn_id not in STNS.keys():
    STNS[stn_name] = df

else:
    STNS[stn_id] = STNS[stn_id].append(df)
    STNS[stn_id].name = df.name
    STNS[stn_id].sid = df.sid
    #outfile.write(line)

For making plot

for stn in STNS:
    STNS[stn_id].plot()
    plt.title('Precipitation for {0}'.format(STNS[stn].name))

The problem is it reads only last year data in last folder. Can anyone help to figure out this problem.Your help will be highly appreciated.

bikuser
  • 2,013
  • 4
  • 33
  • 57

2 Answers2

2

You can do it like this:

import os
import glob
import pandas as pd
import matplotlib.pyplot as plt

# file mask
fmask = r'./data/????/AS*.??'

# all RegEx replacements
replacements = {
  r'T': 0
}

# list of data files
flist = glob.glob(fmask)


def read_data(flist, date_col='date', **kwargs):
    dfs = []
    for f in flist:
        # parse year from the file name
        y = os.path.basename(f).replace('.', '')[-4:]
        df = pd.read_table(f, **kwargs)
        # replace day of year with a date
        df[date_col] = pd.to_datetime(y + df[date_col].astype(str).str.zfill(3), format='%Y%j')
        dfs.append(df)
    return pd.concat(dfs, ignore_index=True)


df = read_data(flist,
               date_col='date',
               sep=r'\s+',
               header=None,
               names=['date','prec'],
               engine='python',
               skipfooter=1,
              ) \
     .replace(replacements, regex=True) \
     .set_index('date') \
     .apply(pd.to_numeric, args=('coerce',))


df.plot()

plt.show()

I've downloaded only four files, so the corresponding data you can see on the plot...

enter image description here

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
1

You overwrite the same file over and over again. Derive your target file name from your source file name. Or use the append mode if you want it all in the same file.

How do you append to a file?

Community
  • 1
  • 1
Jacques de Hooge
  • 6,750
  • 2
  • 28
  • 45