How do I avoid repetition in python codes?

Question

I have data from an instrument installed (dendrometer) in 9 different trees, the variables are: date and time, increment and temperature. I go once a month to the field and download more data. Right now I have done 3 field trips, therefore I call the files "den11, den12, den13..." being the first number related to the tree (number 1) and the second related to the trip (1, 2, 3). I have a few routines to run before I concatenate them and end up with only 9 (dendrom1, dendrom2, dendrom3...) so I can perform some plotting and analysis. But so far I have done a lot copy and paste in my codes which takes a while, is boring and looks terrible. I've tried for loop but I'm a python newbie, learning by myself and this part I haven't cracked.

For example, to read each excel file I had to:

#Tree1
den11= pd.read_excel('den11.xlsx') 
den12= pd.read_excel('den12.xlsx') 
den13= pd.read_excel('den13.xlsx')

#Tree2
den21= pd.read_excel('den21.xlsx')
den22= pd.read_excel('den22.xlsx') 
den23= pd.read_excel('den23.xlsx')
...
#Tree9

Then, to avoid repeating 3 times for each of the nine trees I tried to recreate each of the filenames and assign it to 'f':

trips = [1,2,3]
trees = range(1,10)
for tree in trees:
    for trip in trips:
         f = 'den' + str(tree) + str(trip)
         print(f)

And then I could, maybe read each of them and have their names assigned as a new variable, but I'm clearly not good and I'm missing something here:

os.chdir('...\Plantation\Dendrometers')
basepath = '...\Plantation\Dendrometers'
dlist = os.scandir(basepath)
for dendrometer in dlist:
    f = pd.read_excel(dendrometer)

(I used 'os.scandir' instead of 'os.listdir' because I read that scandir can be interate, I thought that could be an issue)

It didn't work, then I tried assigning a list with all the filenames:

flist = ['den11','den12','den13','den21','den22','den23','den31',
    'den32','den33','den41','den42','den43','den51','den52',
    'den53','den61','den62','den63','den71','den72','den73',
    'den81','den82','den83','den91','den92','den93']

which didn't work either, I guess I can't perform functions using turples.

What would be best not to repeat basic routines for each of the files and be prepared for next data coming up? This is what I've done and it feels terrible:

new_columns = ['date','increment','temp']
den11.columns = new_columns
den12.columns = new_columns
den13.columns = new_columns
den21.columns = new_columns
...

den11.set_index('date', inplace=True)
den12.set_index('date', inplace=True)
...

den11 = den11.loc['2019-02-14':]
den12 = den12.loc['2019-02-14':]
...

dendrom1 = pd.concat([den11,den12,den13])
...

dendrom1 = dendrom1.loc[~dendrom1.index.duplicated(keep='first')]
...dendrom9 = dendrom9.loc[~dendrom9.index.duplicated(keep='first')]

It would be amazing if I could just add one trip, load the folder with new filenames and run the code generating the merged file 'dendrom' for each tree.

Good observation; it does look terrible. What's more terrible is maintaining and extending it :) — ChiefTwoPencils, Jul 11 '19 at 00:32
Related: [How do I create a variable number of variables?](https://stackoverflow.com/q/1373164/4518341) — wjandrea, Jul 11 '19 at 00:36

score 3 · Accepted Answer · answered Jul 11 '19 at 00:34

3

Try os.listdir:

d = {}
for i in os.listdir():
    if '.xlsx' in i:
        df = pd.read_excel(i)
        # do all your operations here that you do for every dataframe
        ...
        d[i] = df

And to extract a specific dataframe, use:

print(d[excel file name])

Then it will output the dataframe you wanted.

answered Jul 11 '19 at 00:34

U13-Forward

69,221
14
89
114

Thank you @U10-Forward ! It worked for most of the operations... Now I'm trying to do another loop to concatenate the files from the same tree, such as: dendrom1 = pd.concat([d['den11.xlsx'],d['den12.xlsx'],d['den13.xlsx']]) I guess only this one I'll have to repeat (9 times). – Beto Jul 11 '19 at 06:07

How do I avoid repetition in python codes?

1 Answers1