Why does Map work but Apply raises ValueError

Question

I'm trying to get more comfortable with various ways of using Pandas, and I'm struggling to understand why Map, Apply, and Vectorization are relatively interchangeable with functions that return non-booleans, but Apply and Vectorization sometimes fail when the function being applied returns a boolean. This question will focus on Apply.

Specifically, I wrote the very simple little code to illustrate the challenge:

import numpy as np
import pandas as pd

# make dataframe
x = range(1000)
df = pd.DataFrame(data = x, columns = ['Number']) 

# simple function to test if a number is a prime number
def is_prime(num):
    if num < 2:
        return False
    elif num == 2: 
        return True
    else: 
        for i in range(2,num):
            if num % i == 0:
                return False
    return True

# test if every number in the dataframe is prime using Map
df['map prime'] = list(map(is_prime, df['Number']))
df.head()

The following gives the output I'd expect:

So here's where I no longer understand what's going on: when I try to use apply, I get a ValueError.

in: df['apply prime'] = df.apply(func = is_prime, args = df['Number'], axis=1)
out: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What am I missing?

Thank you!

p.s. I know there are more efficient ways to test for primes. I purposefully wrote an inefficient function so I could test how much faster apply and vectorization really were than map, but then I ran into this challenge. Thank you.

4.Pi.n · Accepted Answer · 2021-02-09T06:04:05.577

2

So here's where I no longer understand what's going on: when I try to use apply, I get a ValueError.

df.apply(..., axis=1), pass pd.Series(...).

i.e. df['apply prime'] = df['Number'].apply(func = is_prime) should work.

Given that apply is ostensibly faster than map, and vectorization faster still.

In addition pd.DataFrame.apply(...), doesn't use any type of vectorization, just a simple C for loop (ex. cython), so believe that map(...) should be asymptotically faster.

Update

You might need to figure that, .apply(...), method passes the values of a given axis=x to the function and returns Y, which could be any data type, In case of pd.DataFrame (multiple keys).

Suppose that df.shape = (1000, 4), if we are intend to move along axis=1, i.e. df.shape[1], it's means your apply function going to be called 1000 times, each run it's got (4, ) element of a type pd.Series, you could use there keys inside the function itself, or just pass the keys as an arguments, pd.DataFrame.apply(..., args=[...]).

import numpy as np
import pandas as pd

x = np.random.randn(1000, 4)
df = pd.DataFrame(data=x, columns=['a', 'b', 'c', 'd'])

print(df.shape)

df.head()

def func(x, key1, key2):

  # print(x.shape)

  if x[key1] > x[key2]:
    
    return True

  return False

df.apply(func, axis=1, args=['a', 'b'])

edited Feb 09 '21 at 06:04

answered Feb 09 '21 at 05:19

4.Pi.n

1,151
6
15

That did work, thank you! But now I have a second question: what if I want to apply a function that takes two variables as inputs and returns a boolean (e.g. if x > y return true)? How do I pass both? I tried calling df['Num1', 'Num2'].apply(...), but that threw a key error even though the keys were correct. Map worked when I passed the keys in as a tuple. Thoughts on how to use apply with a function that takes multiple inputs? – BLimitless Feb 09 '21 at 05:34
@BLimitless, you can refer to [this answer](https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe/52854800#52854800) of the post [How to apply a function to two columns of Pandas dataframe](https://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe) – SeaBean Feb 09 '21 at 06:22
@BLimitless, note also that your use of list(map(...)) is actually faster than apply(...axis=1). You can refer to the timing comparison in [this answer](https://stackoverflow.com/a/46923192/15070697) of the same post I suggested above. – SeaBean Feb 09 '21 at 06:25
@4.Pi.n, I think your sample code can be simplified as df.apply(lambda x: func(x['a'], x['b']), axis=1) and define `def func(key1, key2)` and `if key1 > key2` so that this function can be more generic and be used in scope other than pandas. – SeaBean Feb 09 '21 at 06:41
@BLimitless, see also [my answer](https://stackoverflow.com/a/66034661/15070697) in a previous post with comparison of list(map(..)) and apply(...axis=1). And [this one](https://stackoverflow.com/a/66062197/15070697) as well. Hence, suggest to stick on using list(map(..))) instead of apply(...axis=1). – SeaBean Feb 09 '21 at 07:10

Why does Map work but Apply raises ValueError

1 Answers1

Update