Transforming multiple observational feature to single observational feature in Pandas Dataframe Python

Question

I have a dataframe that contains mother ids and multiple observations for the column (preDiabetes) as such:

    ChildID   MotherID   preDiabetes
0     20      455        No
1     20      455        Not documented
2     13      102        NaN
3     13      102        Yes
4     702     946        No
5     82      571        No
6     82      571        Yes
7     82      571        Not documented

I want to transform the multiple observational feature (preDiabetes) into one with single observations for each MotherID.

To do this, I will create a new dataframe with feature newPreDiabetes and:

assign newPreDiabetes a value of "Yes" if preDiabetes=="Yes" for a particular MotherID regardless of the remaining observations
. Otherwise if preDiabetes != "Yes" for a particular MotherID, I will assign newPreDiabetes a value of "No"

Therefore, my new dataframe will have single observation for the feature preDiabetes and unique MotherIDs as such:

    ChildID   MotherID   newPreDiabetes
0   20        455        No
1   13        102        Yes
2   702       946        No
3   82        571        Yes

I am new to Python and Pandas, so I am not sure what the best way to achieve this is, but this is what I have tried so far:


    # get list of all unique mother ids
    uniqueMotherIds = pd.unique(df[['MotherID']].values.ravel())
    
    # create new dataframe that will contain unique MotherIDs and single observations for newPreDiabetes
    newDf = {'MotherID','newPreDiabetes' }
    
    # iterate through list of all mother ids and look for preDiabetes=="Yes"
    for id in uniqueMotherIds:
        filteredDf= df[df['MotherID'] == id].preDiabetes=="Yes"
        result = pd.concat([filteredDf, newDf])

The code is not yet complete and I would appreciate some help as I am not sure if I am on the right track!

Many thanks :)

mujjiga · Accepted Answer · 2020-07-20T16:02:54.127

df = pd.DataFrame({
        'MotherID': [455, 455,102,102,946,571,571,571],
        'preDiabetes' : ['No','Not documented', np.NaN,
                         'Yes', 'No','No','Yes','Not documented'],
        'ChildID' : [20,20,13,13,702,82,82,82]                   
                   })

result = df.groupby(['MotherID', 'ChildID'])['preDiabetes'].apply(list).reset_index()
result['newPreDiabetes'] = result['preDiabetes'].apply(
    lambda x: 'Yes' if 'Yes' in x else 'No')
result = result.drop(columns=['preDiabetes'])

Output:


   MotherID ChildID newPreDiabetes
0   102     13      Yes
1   455     20      No
2   571     82      Yes
3   946     702     No

That works thanks! I have another column (ChildID) which I didn't include in the MWE above, how do I include it as part of the output within manipulating it at all? — sums22, Jul 20 '20 at 14:00

Transforming multiple observational feature to single observational feature in Pandas Dataframe Python

1 Answers1

Linked