0

I attempting to replace values from a column in one dataframe based on values in another dataframe.

This is the first three rows of the column to be amended in my df:

enter image description here

This is the other dataframe called 'areas'. This has the info required to update my df.

enter image description here

In this example, the desired output would happen by chance to look just like the 'county' column of the areas df. But this mapping procedure would have to be carried out on over 40,000 columns and it will not follow this pattern as the values in my df comprise of random values of the areas df.

The code I have written so far, is the creation of a dictionary with the key being county:

districts = dict(zip(areas['County'], areas['CC']))

{'Aveiro': 'Águeda', 'Beja': 'Aljustrel', 'Braga': 'Vizela'}

I may be on the right track here, thing is I haven't been able to work out how to have the values taken from both the 'CC' and 'FR' columns. If I'm able to have a dictionary with the key being the county and values, the CC AND FR of the county, I will easily be able to map the DF. Of course I am very much open to other solutions!

Desired output:

enter image description here

LeoGER
  • 355
  • 1
  • 8

1 Answers1

1

How about a simple merge?

my_df = my_df.merge(areas, on = 'County', how = 'left')

EDIT:

If the County column from your first dataframe corresponds to the FR column from the areas dataframe (which seems to be the case here), yu should do:

my_df = my_df.merge(areas, left_on = 'County', right_on = 'FR', how = 'left')

EDIT#2:

Based on further information provided:

# assuming only one column will match with first dataframe `County` column
import numpy as np

df['in_county'] = df['County'].map(dict(zip(areas.County, areas.County)))
df['in_cc'] = df['County'].map(dict(zip(areas.CC, areas.County)))
df['in_fr'] = df['County'].map(dict(zip(areas.FR, areas.County)))
df.drop('County', axis=1, inplace = True)
output = df.replace(np.nan, '').apply(lambda x: ''.join(x), axis = 1)
output = pd.DataFrame(output, columns = ['County'])
  • Thanks! May be on the right path here but having ran the code, it's produced over 2million additional rows so I don't believe the problem is quite solved yet... – LeoGER Mar 30 '20 at 20:43
  • 1
    I have edited my answer based on the dataset samples you have provided. Hope it helps. – Bishwarup Bhattacharjee Mar 30 '20 at 20:50
  • Thanks for your edit! The `county` column from the first dataframe actually contains values at random from any one of the 3 columns in the second df. In the above example, the1st and 3rd values are from the `FR` column but the 2nd value is from the `CC` column. Further down the df however there are correct counties which do not need to be changed and can be found in the `County` column of the second df. – LeoGER Mar 30 '20 at 21:10
  • 1
    Please check if the edited solution is closer to what you are trying to achieve :) – Bishwarup Bhattacharjee Mar 30 '20 at 21:38
  • Hey, it looks like we're on the right track with this. The problem now is that 'output' produced a single column df with what looks like a merge of all the other values in the other columns, which are no longer there. My original df in fact has around 15 columns. I believe that this problem is a product of the lack of clarity in my question and possibly an error where I have adapted your code to suit the real dataframes in my project. I changed around a few things for the purpose of confidentiality. I will continue to work on this tomorrow morning. I'm very grateful for your help! – LeoGER Mar 30 '20 at 23:02