I have a small function I'm running in pandas that throws a ValueError when I run an if x in y
statement. I saw similar-sounding problems recommending Boolean Indexing, .isin()
, and where()
, but I wasn't able to adapt any of the examples to my case. Any advice would be very much appreciated.
Additional note: groups
is a list of lists containing strings outside the dataframe. My goal with the function is see which list an item from the dataframe is in, then return the index of that list. My first version of this in the notebook link below uses iterrows
to loop through the dataframe, but I understand that is sub-optimal in most cases.
Jupyter notebook with some fake data: https://github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb
Thank you!
Code:
def groupFinder(item):
for group in groups:
if item in group:
return groups.index(group)
df['groupID2'] = groupFinder(df['item'])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-808ac3e51e1f> in <module>()
4 return groups.index(group)
5
----> 6 df['groupID2'] = groupFinder(df['item'])
<ipython-input-16-808ac3e51e1f> in groupFinder(item)
1 def groupFinder(item):
2 for group in groups:
----> 3 if item in group:
4 return groups.index(group)
5
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Solution
I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows
by using pandas' apply
function.
df['groupID2'] = df.item.apply(groupFinder)
Thank you everyone for your help and responses.