0

I am trying to replace some missing and incorrect values in my master dataset by filling it in with correct values from two different datasets.

I created a miniature version of the full dataset like so (note the real dataset is several thousand rows long):

import pandas as pd

data = {'From':['GA0251','GA5201','GA5551','GA510A','GA5171','GA5151'],
        'To':['GA0201_T','GA5151_T','GA5151_R','GA5151_V','GA5151_P','GA5171_B'],
        'From_Latitude':[55.86630869,0,55.85508787,55.85594626,55.85692217,55.85669934],
        'From_Longitude':[-4.27138731,0,-4.24126866,-4.24446585,-4.24516129,-4.24358251,],
        'To_Latitude':[55.86614756,0,55.85522197,55.85593762,55.85693878,0],
        'To_Longitude':[-4.271040979,0,-4.241466534,-4.244607602,-4.244905037,0]}
 
dataset_to_correct = pd.DataFrame(data)

However, some values in the From lat/long and the To lat/long are incorrect. I have two tables like the one below for each of From and To, which I would like to substitute into the table in place of the two values for that row.

Table of Corrected From lat/long:

data = {'Site':['GA5151_T','GA5171_B'],
        'Correct_Latitude':[55.85952791,55.87044558],
        'Correct_Longitude':[55.85661767,-4.24358251,]}
        
correct_to_coords = pd.DataFrame(data)

I would like to match this table to the From column and then replace the From_Latitude and From_Longitude with the correct values.

Table of Corrected To lat/long:

data = {'Site':['GA5201','GA0251'],
        'Correct_Latitude':[55.857577,55.86616756],
        'Correct_Longitude':[-4.242770,-4.272140979]}

correct_from_coords = pd.DataFrame(data)

I would like to match this table to the To column and then replace the To_Latitude and To_Longitude with the correct values.

Is there a way to match the site in each table to the corresponding From or To column and then replace only the values in the respective columns?

I have tried using code from this answer (Elegant way to replace values in pandas.DataFrame from another DataFrame) but it seems to have no effect on the database.

(correct_to_coords.set_index('Site').rename(columns = {'Correct_Latitude':'To_Latitude'})                        .combine_first(dataset_to_correct.set_index('To')))
  • hi! Is any one of the answers below working? If so & if you wish, you might consider [accepting](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work/5235#5235) one of them to signal others that the issue is resolved. If not, you can provide feedback so they can be improved (or removed altogether) – Anurag Dabas Aug 14 '21 at 06:06

3 Answers3

1
merge = dataset_to_correct.merge(correct_to_coords, left_on='To', right_on='Site', how='left')

merge.loc[(merge.To == merge.Site), 'To_Latitude'] = merge.Correct_Latitude
merge.loc[(merge.To == merge.Site), 'To_Longitude'] = merge.Correct_Longitude

# del merge['Site']
# del merge['Correct_Latitude']
# del merge['Correct_Longitude']
merge = merge.drop(columns = ['Site','Correct_Latitude','Correct_Longitude'])

merge = merge.merge(correct_from_coords, left_on='From', right_on='Site', how='left')

merge.loc[(merge.From == merge.Site), 'From_Latitude'] = merge.Correct_Latitude
merge.loc[(merge.From == merge.Site), 'From_Longitude'] = merge.Correct_Longitude

# del merge['Site']
# del merge['Correct_Latitude']
# del merge['Correct_Longitude']
merge = merge.drop(columns = ['Site','Correct_Latitude','Correct_Longitude'])

merge
zswqa
  • 826
  • 5
  • 15
1

@zswqa 's answer produces right result, @Anurag Dabas 's doesn't.

Another possible solution, It is a bit faster than merge method suggested above, although both are correct.

dataset_to_correct.set_index("To",inplace=True)
correct_to_coords.set_index("Site",inplace=True)
dataset_to_correct.loc[correct_to_coords.index, "To_Latitude"] = correct_to_coords["Correct_Latitude"]
dataset_to_correct.loc[correct_to_coords.index, "To_Longitude"] = correct_to_coords["Correct_Longitude"]
dataset_to_correct.reset_index(inplace=True)

dataset_to_correct.set_index("From",inplace=True)
correct_from_coords.set_index("Site",inplace=True)
dataset_to_correct.loc[correct_from_coords.index, "From_Latitude"] = correct_from_coords["Correct_Latitude"]
dataset_to_correct.loc[correct_from_coords.index, "From_Longitude"] = correct_from_coords["Correct_Longitude"]
dataset_to_correct.reset_index(inplace=True)

MySlav
  • 151
  • 5
0

lets try dual merge by merge()+pop()+fillna()+drop():

dataset_to_correct=dataset_to_correct.merge(correct_to_coords,left_on='To',right_on='Site',how='left').drop('Site',1)
dataset_to_correct['From_Latitude']=dataset_to_correct.pop('Correct_Latitude').fillna(dataset_to_correct['From_Latitude'])
dataset_to_correct['From_Longitude']=dataset_to_correct.pop('Correct_Longitude').fillna(dataset_to_correct['From_Longitude'])
dataset_to_correct=dataset_to_correct.merge(correct_from_coords,left_on='From',right_on='Site',how='left').drop('Site',1)
dataset_to_correct['To_Latitude']=dataset_to_correct.pop('Correct_Latitude').fillna(dataset_to_correct['To_Latitude'])
dataset_to_correct['To_Longitude']=dataset_to_correct.pop('Correct_Longitude').fillna(dataset_to_correct['To_Longitude'])
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41