2

I know about pandas resampling functions using a DateTimeIndex.

But how can I easily resample/group along an integer index?

The following code illustrates the problem and works:

import numpy as np
import pandas as pd


df = pd.DataFrame(np.random.randint(5, size=(10, 2)), columns=list('AB'))
print(df)

   A  B
0  3  2
1  1  1
2  0  1
3  2  3
4  2  0
5  4  0
6  3  1
7  3  4
8  0  2
9  4  4

# sum of n consecutive elements
n = 3
tuples = [(i, i+n-1) for i in range(0, len(df.index), n)]
df_new = pd.concat([df.loc[i[0]:i[1]].sum() for i in tuples], 1).T
print(df_new)

   A  B
0  4  4
1  8  3
2  6  7
3  4  4

But isn't there a more elegant way to accomplish this?

The code seems a bit heavy-handed to me..

Thanks in advance!

Cord Kaldemeyer
  • 6,405
  • 8
  • 51
  • 81
  • https://stackoverflow.com/questions/37396264/pandas-equivalent-of-resample-for-integer-index Check if this solves your issue. I think it does -- no way to test right now -- but you need to reset your index after. :) – WGS Nov 09 '17 at 08:26
  • I think my own approach is already easier ;-). Thanks anyway! – Cord Kaldemeyer Nov 09 '17 at 09:46

2 Answers2

3

You can floor divide index and aggregate some function:

df1 = df.groupby(df.index // n).sum()

If index is not default (integer, unique) aggregate by floor divided numpy.arange created by len of DataFrame:

df1 = df.groupby(np.arange(len(df)) // n).sum()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can use group by on the integer division of the index by n. i.e.

df.groupby(lambda i: i//n).sum()

here is the code

import numpy as np
import pandas as pd

n=3
df = pd.DataFrame(np.random.randint(5, size=(10, 2)), columns=list('AB'))

print('df:')
print(df)
res = df.groupby(lambda i: i//n).sum()
print('using groupby:')
print(res)

tuples = [(i, i+n-1) for i in range(0, len(df.index), n)]
df_new = pd.concat([df.loc[i[0]:i[1]].sum() for i in tuples], 1).T
print('using your method:')
print(df_new)

and the output

df:
   A  B
0  1  0
1  3  0
2  1  1
3  0  4
4  3  4
5  0  1
6  0  4
7  4  0
8  0  2
9  2  2
using groupby:
   A  B
0  5  1
1  3  9
2  4  6
3  2  2
using you method:
   A  B
0  5  1
1  3  9
2  4  6
3  2  2
sgDysregulation
  • 4,309
  • 2
  • 23
  • 31