1

So lets say I have a data frame that looks like this.

    x1    x2    x3    y1    y2    y3    z1    z2    z3
1   10    10.1  9.9   1     2     3     4     5     6
2   11    11.1  10.9  2     3     4     5     6     7
...

I want to make a 3 columns called [xave,yave,zave] and then have each element in it be the average of the three columns in the df above (this is just an example, I need to do this operation 6 times on 6 sets of three columns)

    x1    x2    x3    y1    y2    y3    z1    z2    z3    xave    yave    zave
1   10    10.1  9.9   1     2     3     4     5     6     10      2       5
2   11    11.1  10.9  2     3     4     5     6     7     11      3       6
...

Right now I am doing this by looping through the dataframes index and using df.set_value(index,col,val) to do this at each index value.

for index in df.index: df.set_value(index = index, col = x1, value = np.average(df.iloc[index,[0:2])

I feel like there has to be a way to do this with a df.apply() or maybe a lambda function. I just need to increase the speed of the function otherwise it takes too long to analyze a large amount of files. I have another script that loops through a directory and grabs all .csv files inside it. It then performs the analysis I am doing and saves the filename and values as an array. The directory is ~3000 files the amount of data can vary from 200kb to 4000kb in each.

I am not sure of the exact timing I am looking up how to figure that out now.

awsmagala
  • 73
  • 8
  • 3
    Also respectfully - you might want to take an hour or two to read over the first few sections of the Pandas docs, it could help considerably with your Pandas workflow. – miradulo Oct 11 '17 at 21:00
  • 1
    The built-in way to time code is to [use the timeit module](https://stackoverflow.com/questions/8220801/how-to-use-timeit-module#). An easier way is to use [IPython's `%timeit` command](https://stackoverflow.com/q/29280470/190597). – unutbu Oct 11 '17 at 21:01
  • I'm sorry I am sure this is a basic question and I am working on improving my understanding of python and pandas but I don't use it every day so sometimes the indexing of dataframes confuses me. I edited my question to make it clearer and why it is different than the question yall flagged. – awsmagala Oct 12 '17 at 00:00

0 Answers0