0

I've a few questions here involving apply() and I often see the comment that I shouldn't be editing or changing a dataframe with .apply(). Why is this?

Here's a simple use case that I have for apply. Just ping an api and append the results to the dataframe:

Initial setup:

import pandas as pd
from random import sample

x = pd.DataFrame({'col1':['john','jim','mary'],
                 'col2':['a@gmail.com', 'b@gmail.com', 'c@gmail.com']})
print(x)

   col1         col2
0  john  a@gmail.com
1   jim  b@gmail.com
2  mary  c@gmail.com

Fake api call. Takes a random result from a list:

mylist = ['valid','invalid']

def api(email):

    return sample(mylist,1)

The apply function which will take the email, feed it to the api, parse the json, then append the result.

def myfun(row):

    email = row['col2']

    # fake API call
    api_response = api(email)

    # NOTE: THIS WOULD BE WHERE I PARSE THE JSON

    # if email is valid
    if api_response == 'valid':

        # append status
        row['status'] = 'success'

        # append some other data
        row['other_data'] = 'api_check_done'

        #return the row
        return row

    # otherwise fail status
    else:
        row['status'] = 'fail'

        row['other_data'] = 'api_check_done'

        #return the row
        return row

# apply the fuction
x.apply(myfun,axis=1)


   col1         col2 status      other_data
0  john  a@gmail.com   fail  api_check_done
1   jim  b@gmail.com   fail  api_check_done
2  mary  c@gmail.com   fail  api_check_done

It seems to work fine.

So I am wondering, what is the problem with this, and is there a better way to do it?

SCool
  • 3,104
  • 4
  • 21
  • 49
  • 1
    I think `apply` here is fine, dont worry use it. But if exist some vectorized solution like `df['a'] + df['b']` then use apply here is bad idea - `df.apply(lambda x: x.a + x.b, axis=1)` – jezrael Aug 30 '19 at 08:20
  • 1
    Yes I always try to use a vectorized solution if I can. But for complex API calls I think `apply` is useful. – SCool Aug 30 '19 at 08:29
  • 1
    Just give this a read. Pretty thoughtful one. https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – sameera sy Aug 30 '19 at 08:37
  • @sameerasy interesting read. it seems apply is not recommended for simple operations like `df[a] + df[b]`. But for something like an api call, with conditions `if this then that`, and parsing a json response, i think `apply` is ok, because calling the api will be slow anyway. – SCool Aug 30 '19 at 09:51

0 Answers0