I have a dataframe that contains mother ids and multiple observations for the column (preDiabetes) as such:
ChildID MotherID preDiabetes
0 20 455 No
1 20 455 Not documented
2 13 102 NaN
3 13 102 Yes
4 702 946 No
5 82 571 No
6 82 571 Yes
7 82 571 Not documented
I want to transform the multiple observational feature (preDiabetes) into one with single observations for each MotherID.
To do this, I will create a new dataframe with feature newPreDiabetes and:
- assign newPreDiabetes a value of "Yes" if preDiabetes=="Yes" for a particular MotherID regardless of the remaining observations
- . Otherwise if preDiabetes != "Yes" for a particular MotherID, I will assign newPreDiabetes a value of "No"
Therefore, my new dataframe will have single observation for the feature preDiabetes and unique MotherIDs as such:
ChildID MotherID newPreDiabetes
0 20 455 No
1 13 102 Yes
2 702 946 No
3 82 571 Yes
I am new to Python and Pandas, so I am not sure what the best way to achieve this is, but this is what I have tried so far:
# get list of all unique mother ids
uniqueMotherIds = pd.unique(df[['MotherID']].values.ravel())
# create new dataframe that will contain unique MotherIDs and single observations for newPreDiabetes
newDf = {'MotherID','newPreDiabetes' }
# iterate through list of all mother ids and look for preDiabetes=="Yes"
for id in uniqueMotherIds:
filteredDf= df[df['MotherID'] == id].preDiabetes=="Yes"
result = pd.concat([filteredDf, newDf])
The code is not yet complete and I would appreciate some help as I am not sure if I am on the right track!
Many thanks :)