I have a dataframe that contains a column containing a string value. I need to replace each value in that column with the results of a function. I'd like to do this without iterating over thousands of rows. The function takes a term and returns the approved new value of that term.
Example: getPreferredTerm('STAINED')
would return 'DISCOLORED' so values of 'STAINED' in the P_TERM column would all be replaced by 'DISCOLORED'.
I'm struggling with using numpy
to accomplish this.
df['P_TERM'] = getPreferredTerm(df['P_TERM'])
the getPreferredTerm function is as follows:
def getPreferredTerm(stresc):
# NOTE" obsData is a dataframe containing legacy terms in a
# column called 'STRESC' and preferred terms in a column
# named 'PTERM' so this function takes a legacy term as input
# and returns a preferred term
try:
df = obsData.loc[(obsData['STRESC'] == stresc)].iloc[0]['P_TERM']
pterm = df
except Exception as e:
pterm = 'UNMAPPED'
return pterm
is it possible to vectorize this function if I pass it a series instead of a single value?