1

I'm using Pandas 0.13.0 and I try to do a sliding average based on the value of the index.

The index values are not equally distributed. The index is sorted with increasing and unique values.

import pandas as pd
import Quantities as pq

f = { 
    'A': [ 0.0,  0.1,  0.2,  0.5,  1.0,  1.4,  1.5] * pq.m,
    'B': [10.0, 11.0, 12.0, 15.0, 20.0, 30.0, 50.0] * pq.kPa      
}

df = pd.DataFrame(f)

df.set_index(df['A'], inplace=True)

The DataFrame gives:

in: print df

out:
      A       B
A                 
0.00  0.00 m  10.0 kPa
0.10  0.10 m  11.0 kPa
0.20  0.20 m  12.0 kPa
0.50  0.50 m  15.0 kPa
1.00  1.00 m  20.0 kPa
1.40  1.40 m  30.0 kPa
1.50  1.50 m  50.0 kPa

Now I would like to do the average of the column B for each x value of the index, between x and x+c, c being a user defined criterion.

For the sake of this example, c = 0.40.

The averaging process would give:

      A       B          C
A                 
0.00  0.00 m  10.0 kPa   11.0 kPa  = (10.0 + 11.0 + 12.0) / 3
0.10  0.10 m  11.0 kPa   12.7 kPa  = (11.0 + 12.0 + 15.0) / 3
0.20  0.20 m  12.0 kPa   13.5 kPa  = (12.0 + 15.0) / 2
0.50  0.50 m  15.0 kPa   15.0 kPa  = (15.0) / 1
1.00  1.00 m  20.0 kPa   25.0 kPa  = (20.0 + 30.0) / 2
1.40  1.40 m  30.0 kPa   40.0 kPa  = (30.0 + 50.0) / 2
1.50  1.50 m  50.0 kPa   50.0 kPa  = (50.0) / 1

Note that because the index values are not evenly space, sometimes the x+c won't be found. It is ok for now, though I will definitely add a way to take the average value at x+c between the value just before and the value just after x+c, so I get a more accurate average.

I tried the solution found here from Zelazny7: pandas rolling computation with window based on values instead of counts

But I can't make it work for my case, where the search is made on the index.

I also looked at: Pandas Rolling Computations on Sliding Windows (Unevenly spaced)

But I don't understand how to apply it to my case.

Any idea how to solve this problem in a efficient pandas approach? (using apply, map or rolling?)

Thanks.

Community
  • 1
  • 1
Julien
  • 231
  • 4
  • 18

1 Answers1

0

What you needed to do from the answer you linked to was to turn the index into a series so you can then call apply on it. The other key thing here is that you also have to index the constructed series the same as your df index as the default is to just create an index from scratch like 0,1,2,3...

In [26]:

def f(x, c):
    ser = df.loc[(df.index >= x) & (df.index <= x + c),'B']
    return ser.mean()

df['C'] = pd.Series(data = df.index, index = df.index).apply(lambda x: f(x,c=0.4))
df

Out[26]:
       A   B          C
A                      
0.0  0.0  10  11.000000
0.1  0.1  11  12.666667
0.2  0.2  12  13.500000
0.5  0.5  15  15.000000
1.0  1.0  20  25.000000
1.4  1.4  30  40.000000
1.5  1.5  50  50.000000
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Hi, many thanks for your answer. It definitely helps. However I loose the unit after the averaging process. The column C should return the results with 'kPa' as units. If I print ser.mean() in f function, then the unit is attached. But it's being lost when returning the results. – Julien Oct 07 '14 at 09:34
  • @Julien I tried to use your quantities module but it did nothing for my data, you could either add this to the function e.g. `return ser.mean() * pq.kPa` or do this after the column is generated e.g. `df['C'] = df['C'] * pq.kPa` – EdChum Oct 07 '14 at 09:52
  • @Julien the problems could be that I am running different version, I'm using pandas `0.14.1`, numpy `1.9.0` and python `3.3.2` 64-bit – EdChum Oct 07 '14 at 09:55
  • Ok thanks for the clarification. Pandas 0.14.1 is not working for me for some reasons (can't find a way to install it with pip wheel, and it's an IT requirements for me not to use only the exe). The only way to deal with the unit with my current config is to do: `unit = df['B'].values[0].units` then `df['C'] = [x*unit for x in df['C'].values]`. I'll mark your answer as solved as it seems I have an issues with Pandas on my side. Thanks again for your help. – Julien Oct 07 '14 at 10:20
  • For info, I'm using Pandas 0.13.0, numpy 1.8.1 and python 2.6.6. I have to stick to python 2.6.6 for IT reason, but I can investigate Pandas 0.14.1. Cheers. – Julien Oct 07 '14 at 10:22