compute the mean values over every n elements of different groups

Question

Imagine I have a dataframe like the following in Ipython:

df = pd.DataFrame({
    'A' : ['1', '1', '1', '1','1', '1', '2', '2', '2', '2', '2', '2'],
    'B' : ['00:00', '00:10', '00:20', '00:30','01:10', '01:20','00:00', '00:10', '00:20', '00:30','01:10', '01:20',],
    'C' : [2,3,4,2,4,5,6,7,1,5,6,4]}
)

enter image description here

what I need is the right side result: which is group by over A and then compute the mean on every 2(n) rows of each group. I need to do it for a very large scale data set with more that 4K groups.

I tried to used Pandas and I think it could be a useful library.

yes, I have tried: grouped = df.groupby('A') then I can access to each group this way: group1 = grouped.get_group('1') and the to get every 2 elemens of a group by using group1[0:2], and then run it in a for loop, which makes alot of time! — KOrrosh Sh, Jun 15 '15 at 07:42
Your answer probably lies in learning [what you can do with a DataFrame](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html) (obviously not a short task). — leewz, Jun 15 '15 at 09:51

steboc · Answer 1 · 2015-06-15T10:19:50.840

1

this solution work with your example

df.groupby(['A',(df.index/2).astype(int)])['C'].mean()

Edit : more versatile solution. independent of the index :

g1 =(df.groupby(['A'])['B'].rank()/2).astype(int)
df.groupby(['A',g1])['C'].mean()

edited Jun 15 '15 at 10:19

answered Jun 15 '15 at 10:05

steboc

1,161
1
7
17

it looks nice answer, I tried it but the result was not correct. I made a loop over all them, and it works not badly. I try to reuse yours... – KOrrosh Sh Jun 15 '15 at 10:41

score 0 · Answer 2 · edited May 23 '17 at 12:24

I don't know Pandas, but here's Python.

A = ['1', '1', '1', '1','1', '1', '2', '2', '2', '2', '2', '2'],
B = ['00:00', '00:10', '00:20', '00:30','01:10', '01:20','00:00', '00:10', '00:20', '00:30','01:10', '01:20',],
C = [2,3,4,2,4,5,6,7,1,5,6,4]}

return [(a, (b0+b1)/2) for a, b0, b1 in zip(A[::2], B[::2], B[1::2])]

The B[::2] slice notation means "I want every other element of the list, starting from the beginning", while B[1::2] means "I want every other element, starting from B[1]".

The zip function takes multiple sequences and returns a list (or, in Python 3, an iterator) of tuples, taking one element at a time from each sequence.

compute the mean values over every n elements of different groups

2 Answers2