0

Similar to this question, I have a feature 'preWeight' which has multiple observations for each MotherID, I want to transform this to dataframe to a new datframe where

  • I assign preWeight a value of "Yes" if preWeight>=4000 for a particular MotherID regardless of the remaining observations
  • Otherwise if preWeight is <4000 for a particular MotherID, I will assign preWeight a value of "No"

So I want to transform this dataframe:

    ChildID   MotherID   preWeight
0     20      455        3500
1     20      455        4040
2     13      102        2500
3     13      102        NaN
4     702     946        5000
5     82      571        2000
6     82      571        3500
7     82      571        3800

Into this:

    ChildID   MotherID   preWeight
0   20        455        Yes
1   13        102        No
2   702       946        Yes
3   82        571        No

I have tried this:

df.groupby('MotherID')['preWeight'].apply(
    lambda x: 'Yes' if x>4000 in x.values else 'No').reset_index()

Bu I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thanks in advance.

sums22
  • 1,793
  • 3
  • 13
  • 25
  • What value is `preWeight` supposed to have if `preWeight` is once below 4000 and once above 4000 for the same `ChildID` and `MotherID`? – drops Jul 20 '20 at 14:41

1 Answers1

2

Try this with pandas.DataFrame.any:

df.groupby(['ChildID','MotherID']).agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index()

Output:

   ChildID  MotherID preWeight
0       13       102        No
1       20       455       Yes
2       82       571        No
3      702       946       Yes
MrNobody33
  • 6,413
  • 7
  • 19
  • I think your answer is missing preWeight, so it should be: df.groupby(['ChildID','MotherID'])['preWeight'].agg(lambda x: 'Yes' if (x>4000).any() else 'No').reset_index() – sums22 Jul 20 '20 at 21:16
  • Also, why did you use the agg function here not apply, what is the difference? – sums22 Jul 20 '20 at 21:21
  • It doesn't matter, because since there were three columns and when I grouped by, the index become the first two columns, so, specifying the column that it's going to be modified in this case, doesn't matter, because it only lefts one column. @sums22 – MrNobody33 Jul 20 '20 at 21:26
  • [Here](https://stackoverflow.com/a/44864946/13676202) is the difference about agg and apply. But in this case, there wasn't an specific reason. Also, if it was helpful, consider [accepting the answer](https://meta.stackexchange.com/a/5235), thanks :). @sums22 – MrNobody33 Jul 20 '20 at 21:34