0

I have two dataframes , First dataframe contains data from given location and the second dataframe has data with coordinates to identify those locations from first dataframe.

I want to compare column values of locations from first dataframe to county values from the second dataframe then attach the coordinates at the exact location name on the first dataframe. Here is the first dataframe:

import pandas as pd

boxes = {'ID': ['43001','43002','43003','43004','43005','43006','43007','43008'],
         'Location': ['Busia','Nairobi','Tharaka','Kakamega','Kajiado','Meru','NaN','Kiambu']
        }
df = pd.DataFrame(boxes, columns = ['ID', 'Location'])

print(df)

Second dataframe

import pandas as pd

coord = {'ID': ['456001','653002','783003','4533004','900005','4367006','643007','943008'],
         'county': ['Busia','Nairobi','Tharaka','Kakamega','Kajiado','Meru','NaN','Kiambu'],
         'geometry': ['POLYGON ((35.79593))','POLYGON ((355.79593))','POLYGON ((35.79593))','POLYGON ((535.79593))','POLYGON ((835.79593))','POLYGON ((735.79593))','POLYGON ((335.79593))','POLYGON ((635.79593))']
        }
df1 = pd.DataFrame(coord, columns = ['ID', 'county', 'geometry'])

print(df1)

Intention is compare Location values from first dataframe to County values from second dataframe. If the names are the same then attach geometry values for those specific locations on first dataframe.

End result to look like this

import pandas as pd

boxes = {'ID': ['43001','43002','43003','43004','43005','43006','43007','43008'],
         'Location': ['Busia','Nairobi','Tharaka','Kakamega','Kajiado','Meru','NaN','Kiambu'],
         'geometry': ['POLYGON ((35.79593)','POLYGON ((335.79593))','POLYGON ((35.79593))','POLYGON ((635.79593))', 'POLYGON ((835.79593))','POLYGON ((735.79593))','NaN','POLYGON ((535.79593))']
        }
df = pd.DataFrame(boxes, columns = ['ID', 'Location', 'geometry'])

print(df)

I tried this but didnt work well as expected

if df['Location'].values == df1['county'].values:
    
    
    df2 = pd.concat([df, df1['geometry']], axis=1)
df2
    
LivingstoneM
  • 1,088
  • 10
  • 28

1 Answers1

1

For this kind of operation, pandas have a method called merge(). It basically does the same as JOIN in SQL. For your example:

df.merge(df1[['county', 'geometry']], left_on='Location', right_on='county')

Note that I did not specify the type of join; by default, it is the inner join (only records found in both dataframes are retained). On the contrary, if you wanted to retain all records from the left-hand table (i.e. df), reagardless whether you find a matching county in the right-hand table (df1) you would call:

df.merge(df1[['county', 'geometry']], how= 'left', left_on='Location', right_on='county')

additional remarks:

  • pandas docs: merge
  • Bear in mind you are matching on strings (Location is matched to county). I can see you are having text values NaN in your data, which does not seem as a valid city name -> you may want to resolve its meaning before any merge.
Vojta F
  • 534
  • 3
  • 17