0

I have two different datasets

df1
    Name Surname Age  Address
    Julian Ross   34  Main Street
    Mary   Jane   52  Cook Road

len(1200)

df2
    Name     Country   Telephone

    Julian   US         NA

len(800)

df1 contains the full list of unique names; df2 contains less rows as many Name were not added. I would like to get a final dataset with the full list of names in df1 (and all the fields that are there) plus the fields in df2. I would then expect a final dataset of length 1200 with some empty fields corresponding to the missing name in df2. I have tried as follows:

pd.concat([df1.set_index('Name'),df2.set_index('Name')], axis=1, join='inner')

but it returns the length of the smallest dataset (i.e. 800). I have also tried

df1.merge(df2, how = 'inner', on = ['Name'])

... same result.

I am not totally familiar with joining/merging/concatenating functions, even after reading the document https://pandas.pydata.org/docs/user_guide/merging.html . I know that probably this question will be a duplicate of some others and I will be happy to delete it if necessary, but I would be really grateful if you could provide same help and explaining how to get the expected result:

df

Name Surname Age Address Country Telephone
Julian Ross  34   Main Street US   NA
Mary Jane    52   Cook Road
LdM
  • 674
  • 7
  • 23

1 Answers1

1

IIUC, Use pd.merge like below:

>>> df1.merge(df2, how='left', on='Name')

     Name Surname  Age      Address Country  Telephone
0  Julian    Ross   34  Main Street      US        NaN
1    Mary    Jane   52    Cook Road     NaN        NaN

If you want to keep the number of rows of df1, you have to use how='left' in the case where there is no duplicate names in df2.

Read Pandas Merging 101

Corralien
  • 109,409
  • 8
  • 28
  • 52