I have a big pandas Dataframe with fictional persondata. The below is a small example - each person is defined by a number.
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'Number':["5569", "3385", "9832", "6457", "5346", "5462", "9873", "2366"] , 'Gender': ['Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female'], 'Children': [np.nan, "5569 6457", "5569", np.nan, "6457", "2366", "2366", np.nan]})
df
Number Gender Children
0 5569 Male NaN
1 3385 Male 5569 6457
2 9832 Female 5569
3 6457 Male NaN
4 5346 Female 6457
5 5462 Female 2366
6 9873 Male 2366
7 2366 Female NaN
Some of the people are the children of some of the others. Now I want to make two columns "Mother" and "Father" and fill them with the relevant numbers. I would get those by looking at the "Children" column and then adding someone as the father if they are a male and has the number of the child in "Children" and the same for females as mothers. However, some of the values are NaN and some people have multiple children (they can have more than 4 in the actual dataset).
I've been trying with .isin and similar but I simply can't get it to work.
They expected output for this example would look like this:
df = pd.DataFrame({ 'Number':["5569", "3385", "9832", "6457", "5346", "5462", "9873", "2366"] , 'Gender': ['Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female'], 'Children': [np.nan, "5569 6457", "5569", np.nan, "6457", "2366", "2366", np.nan], 'Mother':[9832, np.nan, np.nan,"5346", np.nan, np.nan, np.nan, "5462"], 'Father':["3385", np.nan, np.nan, "3385", np.nan, np.nan, np.nan, "9873"]})
df
Number Gender Children Mother Father
0 5569 Male NaN 9832 3385
1 3385 Male 5569 6457 NaN NaN
2 9832 Female 5569 NaN NaN
3 6457 Male NaN 5346 3385
4 5346 Female 6457 NaN NaN
5 5462 Female 2366 NaN NaN
6 9873 Male 2366 NaN NaN
7 2366 Female NaN 5462 9873