0

I have 5 million records with address field in df['address'], and Top 1000 UK cities info in cities[['City', 'Region', 'Population']].

I want to make new columns for region and population. My current command is too slow: string contains for all 1000 cities in all 5 million records means 5 billion boolean calculations.

Is there some way to do a better lookup here?

This code works but is very slow.

df['Region'], df['Pop'] = np.nan, np.nan

for i in range(0, 1000):
    df.loc[df['City'].str.contains(cities.iloc[i, 0]),
           ['Region', 'Pop']] = cities.iloc[i, [1, 2]].values
mozway
  • 194,879
  • 13
  • 39
  • 75
Robbie G
  • 11
  • 2

0 Answers0