Python Pandas groupby row data manipulation

Question

I am a newbie to Python and pandas and learning the same. I have basic question for pandas dataframe related to groupby. I have a dataframe and from that I want to have some calculations as below:

  SH    TH  QH  RH
  S1    B   10  5
  S2    B   12  8
  S1    B   5   8
  S1    S   5   10
  S1    S   3   12

And I want something like this as intermediate:

  SH    TH  QH  RH
  S1    B   15  6
  S1    S   8   10.75
  S2    B   12  8

And final result as below:

  SH    TH  QH  RH
  S1    B   7   6
  S2    B   12  8

I want to know how best I can do it in python pandas way of doing it.

Thanks Nand

Can you tell us what you are you trying to do to for each group/column? — Phik, Jan 17 '18 at 16:12
To be more specific: groupby(['SH', 'TH']) and QH is sum of each group B or S and RH is weighted sum for each group B or S for the intermediate case. And for the final result the QH is difference of B and S. — nandneo, Jan 17 '18 at 16:25
You have to be more specific regarding how many different values exists in SH. (It already appears in your example that not all (SH, TH) group will have a TH=S line ) — Phik, Jan 17 '18 at 16:36
There may have many values for SH but for each SH it can be either B or S in TH. Thanks — nandneo, Jan 17 '18 at 16:50

Phik · Accepted Answer · 2018-01-17T17:11:51.500

0

Following this answer to get the weighted mean. You can get the intermediate result this way:

wm = lambda x: np.average(x, weights=df.loc[x.index, "QH"])
df.groupby(['SH', 'TH'], as_index=False).agg({"QH":"sum", "RH":wm})

Edit To get your full result

def nand_apply(f):
  tmp =  f.groupby('TH', as_index=False).agg({"QH":"sum", "RH":wm})

  if len(tmp)>1:
    tmp['QH']=tmp['QH'].transform(lambda x: x.diff(-1, ))

  return tmp.iloc[0]

df.groupby(['SH']).apply(nand_apply)

(Note that this heavily rely having only two keys values being B and S in column TH)

edited Jan 17 '18 at 17:11

answered Jan 17 '18 at 16:34

Phik

414
3
15

Thanks @Phik, it solves my problem. Now I would like to get understand the technical details behind it that how rows are being processed. As this was one of the problem at hand and similarly there are many more. So if I understand it then I can try to do it on my own. Thanks in advance. – nandneo Jan 18 '18 at 06:19
Calling an `apply` after a groupby will let you apply a function to the "sub data frame" of that group. Then inside the apply (so for each data frame that corresponds to a `SH` group), you do the computation first with `agg` for the column-wise operation and then with transform for the B-S part (with the if in case there is no S). I can recommend you https://github.com/jakevdp/PythonDataScienceHandbook ( chapter on Pandas ) if you want an excellent introduction to Pandas. – Phik Jan 19 '18 at 13:13

Python Pandas groupby row data manipulation

1 Answers1