I'm trying to get more comfortable with various ways of using Pandas, and I'm struggling to understand why Map, Apply, and Vectorization are relatively interchangeable with functions that return non-booleans, but Apply and Vectorization sometimes fail when the function being applied returns a boolean. This question will focus on Apply.
Specifically, I wrote the very simple little code to illustrate the challenge:
import numpy as np
import pandas as pd
# make dataframe
x = range(1000)
df = pd.DataFrame(data = x, columns = ['Number'])
# simple function to test if a number is a prime number
def is_prime(num):
if num < 2:
return False
elif num == 2:
return True
else:
for i in range(2,num):
if num % i == 0:
return False
return True
# test if every number in the dataframe is prime using Map
df['map prime'] = list(map(is_prime, df['Number']))
df.head()
The following gives the output I'd expect:
So here's where I no longer understand what's going on: when I try to use apply, I get a ValueError.
in: df['apply prime'] = df.apply(func = is_prime, args = df['Number'], axis=1)
out: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What am I missing?
Thank you!
p.s. I know there are more efficient ways to test for primes. I purposefully wrote an inefficient function so I could test how much faster apply and vectorization really were than map, but then I ran into this challenge. Thank you.