I've a few questions here involving apply()
and I often see the comment that I shouldn't be editing or changing a dataframe with .apply()
. Why is this?
Here's a simple use case that I have for apply. Just ping an api and append the results to the dataframe:
Initial setup:
import pandas as pd
from random import sample
x = pd.DataFrame({'col1':['john','jim','mary'],
'col2':['a@gmail.com', 'b@gmail.com', 'c@gmail.com']})
print(x)
col1 col2
0 john a@gmail.com
1 jim b@gmail.com
2 mary c@gmail.com
Fake api call. Takes a random result from a list:
mylist = ['valid','invalid']
def api(email):
return sample(mylist,1)
The apply
function which will take the email, feed it to the api, parse the json, then append the result.
def myfun(row):
email = row['col2']
# fake API call
api_response = api(email)
# NOTE: THIS WOULD BE WHERE I PARSE THE JSON
# if email is valid
if api_response == 'valid':
# append status
row['status'] = 'success'
# append some other data
row['other_data'] = 'api_check_done'
#return the row
return row
# otherwise fail status
else:
row['status'] = 'fail'
row['other_data'] = 'api_check_done'
#return the row
return row
# apply the fuction
x.apply(myfun,axis=1)
col1 col2 status other_data
0 john a@gmail.com fail api_check_done
1 jim b@gmail.com fail api_check_done
2 mary c@gmail.com fail api_check_done
It seems to work fine.
So I am wondering, what is the problem with this, and is there a better way to do it?