I have a pandas dataframe that looks like:
d = {'some_col' : ['A', 'B', 'C', 'D', 'E'],
'alert_status' : [1, 2, 0, 0, 5]}
df = pd.DataFrame(d)
Quite a few tasks at my job require the same tasks in pandas. I'm beginning to write standardized functions that will take a dataframe as a parameter and return something. Here's a simple one:
def alert_read_text(df, alert_status=None):
if (alert_status is None):
print 'Warning: A column name with the alerts must be specified'
alert_read_criteria = df[alert_status] >= 1
df[alert_status].loc[alert_read_criteria] = 1
alert_status_dict = {0 : 'Not Read',
1 : 'Read'}
df[alert_status] = df[alert_status].map(alert_status_dict)
return df[alert_status]
I'm looking to have the function return a series. This way, one could add a column to an existing data frame:
df['alert_status_text'] = alert_read_text(df, alert_status='alert_status')
However, currently, this function will correctly return a series, but also modifies the existing column. How do you make it so the original column passed in does not get modified?