I have 5 million records with address field in df['address'], and Top 1000 UK cities info in cities[['City', 'Region', 'Population']].
I want to make new columns for region and population. My current command is too slow: string contains for all 1000 cities in all 5 million records means 5 billion boolean calculations.
Is there some way to do a better lookup here?
This code works but is very slow.
df['Region'], df['Pop'] = np.nan, np.nan
for i in range(0, 1000):
df.loc[df['City'].str.contains(cities.iloc[i, 0]),
['Region', 'Pop']] = cities.iloc[i, [1, 2]].values