0

Imagine I have a dataframe like the following in Ipython:

df = pd.DataFrame({
    'A' : ['1', '1', '1', '1','1', '1', '2', '2', '2', '2', '2', '2'],
    'B' : ['00:00', '00:10', '00:20', '00:30','01:10', '01:20','00:00', '00:10', '00:20', '00:30','01:10', '01:20',],
    'C' : [2,3,4,2,4,5,6,7,1,5,6,4]}
)

enter image description here

what I need is the right side result: which is group by over A and then compute the mean on every 2(n) rows of each group. I need to do it for a very large scale data set with more that 4K groups.

I tried to used Pandas and I think it could be a useful library.

Thomas K
  • 39,200
  • 7
  • 84
  • 86
KOrrosh Sh
  • 134
  • 1
  • 8
  • Did you try anything at all? – sgp Jun 15 '15 at 07:37
  • yes, I have tried: grouped = df.groupby('A') then I can access to each group this way: group1 = grouped.get_group('1') and the to get every 2 elemens of a group by using group1[0:2], and then run it in a for loop, which makes alot of time! – KOrrosh Sh Jun 15 '15 at 07:42
  • Your answer probably lies in learning [what you can do with a DataFrame](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.html) (obviously not a short task). – leewz Jun 15 '15 at 09:51

2 Answers2

1

this solution work with your example

df.groupby(['A',(df.index/2).astype(int)])['C'].mean()

Edit : more versatile solution. independent of the index :

g1 =(df.groupby(['A'])['B'].rank()/2).astype(int)
df.groupby(['A',g1])['C'].mean()
steboc
  • 1,161
  • 1
  • 7
  • 17
  • it looks nice answer, I tried it but the result was not correct. I made a loop over all them, and it works not badly. I try to reuse yours... – KOrrosh Sh Jun 15 '15 at 10:41
0

I don't know Pandas, but here's Python.

A = ['1', '1', '1', '1','1', '1', '2', '2', '2', '2', '2', '2'],
B = ['00:00', '00:10', '00:20', '00:30','01:10', '01:20','00:00', '00:10', '00:20', '00:30','01:10', '01:20',],
C = [2,3,4,2,4,5,6,7,1,5,6,4]}

return [(a, (b0+b1)/2) for a, b0, b1 in zip(A[::2], B[::2], B[1::2])]

The B[::2] slice notation means "I want every other element of the list, starting from the beginning", while B[1::2] means "I want every other element, starting from B[1]".

The zip function takes multiple sequences and returns a list (or, in Python 3, an iterator) of tuples, taking one element at a time from each sequence.

Community
  • 1
  • 1
leewz
  • 3,201
  • 1
  • 18
  • 38