0

Whenever there is a matching column/key between dataframe dfa and dfb, I want to change the name of the country in dataframe a with match in dataframe b.

dfa data:

Country
Bolivia
Bosnia-Herzegovina



dfb data: (full and correct data)

ISO Country
BOL 'Bolivia, Plurinational State of
BIH Bosnia and Herzegovina

First I tried to merge the tables dfa and dfb based on country and any then do a string replace based on like but for some unknown reason any strings with commas are not matching still

    dfa.ix[dfa.Country=='Bolivia', 'Country'] = 'Bolivia, Plurinational State of'

This works in that the replacement happens, but when the data is compared to the second table which contains this exact string, it doesn't see it as equal so does not update the null when I merge the tables.

   dfab_merge_null = dfab_merge.loc[dfab_merge['ISO'].isnull()] 
   dfab_merge_null.Country.unique()

array(['Azores', 'Bolivia, Plurinational State of', 'Canary Is', 'Iran, Islamic Republic of', "Korea, Democratic People's Republic of", 'Korea, Republic of', 'Macedonia, The Former Yugoslav Republic of', 'Micronesia\, Federated States of', 'Moldova, Republic of', 'Palestine (West Bank)', 'St Kitts and Nevis'... etc



dfa does not contain the ISO column, which is what I want to add ultimately. dfb contains the ISO that I'm trying to transfer into dfa based on loosely matching string names in Country between the two tables.

I'm unsure what other method to try to update. I've been stuck on this for over a day and would appreciate any help. Thanks!

maric
  • 79
  • 1
  • 11
  • Sorry, above code contains a typo - – maric Jun 23 '16 at 18:24
  • dfa.ix[s1.Country=='Bolivia', 'Country'] = 'Bolivia, Plurinational State of' should be dfa.ix[dba.Country=='Bolivia', 'Country'] = 'Bolivia, Plurinational State of' but this has nothing to do with the problem and this is not the cause. – maric Jun 23 '16 at 18:24
  • Just update your question, instead of adding comments. You may want to check [this](http://stackoverflow.com/questions/13636848/is-it-possible-to-do-fuzzy-match-merge-with-python-pandas) – MaxU - stand with Ukraine Jun 23 '16 at 18:26

1 Answers1

0

If the two dataframes are lined up row wise or each have an index that already ensures that the correct rows will be matched, then this should work:

dfb.combine_first(dfa)

If it isn't lined up, meaning "Bolivia" is labeled row 0 in dfa and "Bolivia, Plurinational State of" is labeled row 11 in dfb then you have a whole other issue.

piRSquared
  • 285,575
  • 57
  • 475
  • 624