So lets say I have a data frame that looks like this.
x1 x2 x3 y1 y2 y3 z1 z2 z3
1 10 10.1 9.9 1 2 3 4 5 6
2 11 11.1 10.9 2 3 4 5 6 7
...
I want to make a 3 columns called [xave,yave,zave]
and then have each element in it be the average of the three columns in the df above (this is just an example, I need to do this operation 6 times on 6 sets of three columns)
x1 x2 x3 y1 y2 y3 z1 z2 z3 xave yave zave
1 10 10.1 9.9 1 2 3 4 5 6 10 2 5
2 11 11.1 10.9 2 3 4 5 6 7 11 3 6
...
Right now I am doing this by looping through the dataframes index and using df.set_value(index,col,val)
to do this at each index value.
for index in df.index:
df.set_value(index = index, col = x1, value = np.average(df.iloc[index,[0:2])
I feel like there has to be a way to do this with a df.apply()
or maybe a lambda function. I just need to increase the speed of the function otherwise it takes too long to analyze a large amount of files. I have another script that loops through a directory and grabs all .csv files inside it. It then performs the analysis I am doing and saves the filename and values as an array. The directory is ~3000 files the amount of data can vary from 200kb to 4000kb in each.
I am not sure of the exact timing I am looking up how to figure that out now.