I have multiple simple functions that need to be implemented on every row of certain columns of my dataframe. The dataframe is very like, 10 million+ rows. My dataframe is something like this:
Date location city number value
12/3/2018 NY New York 2 500
12/1/2018 MN Minneapolis 3 600
12/2/2018 NY Rochester 1 800
12/3/2018 WA Seattle 2 400
I have functions like these:
def normalized_location(row):
if row['city'] == " Minneapolis":
return "FCM"
elif row['city'] == "Seattle":
return "FCS"
else:
return "Other"
and then I use:
df['Normalized Location'] =df.apply (lambda row: normalized_location (row),axis=1)
This is extremely slow, how can I make this more efficient?