0

I have been trying to create a dataframe that contains the ID of a character and the ID of the characters mother(if they have one). The data I am using is of GoT characters and the raw df can be seen below.

dataframe1

I have created a dataframe that stores all characters names and the name of their mother (if they have one) however im looking for the character's and mother's ID not their name.

dataframe2

I have tried using a merge to solve this but so far have not been able to accomplish.

Any advice would be much appreciated.

  • 3
    Welcome to Stackoverflow. Please [don't post images/links to images of code/data](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) instead add them in text format so that we could be able to copy these while trying to answer your question. Please take some time to read [How to ask good pandas questions?](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Shubham Sharma May 08 '21 at 18:10

1 Answers1

0

I've tried to reproduce what you asked for with a dataframe with two lines:

dataframe1 = pd.DataFrame({'ID': ['C0000', 'Z000'], 'name': ['Viserys II Targaryen', 'Rhaenira Targaryen'], 'male': [1,0], 'mother': ['Rhaenira Targaryen', np.nan]})

dataframe2 = dataframe1[['name', 'mother']]

Then we merge the two dataframes, dataframe1 and dataframe2:

merged_df = dataframe2.merge(dataframe1[['ID', 'name']], how = 'outer', on = 'name')

The result is the following:

name                  mother              ID
Viserys II Targaryen  Rhaenira Targaryen  C0000
Rhaenira Targaryen    NaN                 Z0000

Then we merge merged_df and dataframe1:

final = merged_df.merge(dataframe1[['ID', 'name']], how = 'left', left_on = 'mother', right_on = 'name')

Finally the result is:

name_x                mother              ID_x  ID_y  name_y
Viserys II Targaryen  Rhaenira Targaryen  C0000 Z0000 Rhaenira Targaryen
Rhaenira Targaryen    NaN                 Z0000 NaN   NaN

The column name_y can be removed and you can rename the columns ID_x and ID_y as you see fit.