0

I have a look up table as a dataframe (1000 rows) consisting of codes and labels. I have another dataframe (2,00,000 rows) consisting of codes and geometries.

I need to get label names for each corresponding code by looking in the look up dataframe.

Output should be dataframe.

I tried it as follows.

df = pd.read_csv(filepath)

codes = df['codes'].values
labels = df['labels'].values 


df2 = pd.read_csv(filepath)
print (df2.shape)    

for ix in df2.index:
    code = df2.loc[ix, 'code']
    df2.loc[ix, 'label'] = labels[codes==code][0]   

   
print (df2)

Result is correct, but it's very slow... for looping is very slow

Can you help me?

gislady
  • 3
  • 2

1 Answers1

1

You should use the merge method of DataFrames (https://pandas.pydata.org/docs/reference/api/pandas.merge.html). It allows to join two dataframes based on a common column. Your code should look like this:

df2 = df2.merge(df, left_on="code", right_on="codes", how="left")
# Check labels using df2["labels"]

The common column name is specified in the parameters left_on and right_on. The parameter how='left' indicates that all the rows from df2 are preserved even if there is no code for a row.

Sergio Peñafiel
  • 446
  • 5
  • 13