Pandas: Alternative to iterrow loops

Question

I have a small function I'm running in pandas that throws a ValueError when I run an if x in y statement. I saw similar-sounding problems recommending Boolean Indexing, .isin(), and where(), but I wasn't able to adapt any of the examples to my case. Any advice would be very much appreciated.

Additional note: groups is a list of lists containing strings outside the dataframe. My goal with the function is see which list an item from the dataframe is in, then return the index of that list. My first version of this in the notebook link below uses iterrows to loop through the dataframe, but I understand that is sub-optimal in most cases.

Jupyter notebook with some fake data: https://github.com/amoebahlan61/sturdy-chainsaw/blob/master/Grouping%20Test_1.1.ipynb

Thank you!

Code:

def groupFinder(item):
    for group in groups:
        if item in group:
            return groups.index(group)

df['groupID2'] = groupFinder(df['item'])


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-808ac3e51e1f> in <module>()
      4             return groups.index(group)
      5 
----> 6 df['groupID2'] = groupFinder(df['item'])

<ipython-input-16-808ac3e51e1f> in groupFinder(item)
      1 def groupFinder(item):
      2     for group in groups:
----> 3         if item in group:
      4             return groups.index(group)
      5 

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Solution I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows by using pandas' apply function.

df['groupID2'] = df.item.apply(groupFinder)

Thank you everyone for your help and responses.

In general, it's not a good idea to include links to data in questions - links can die, for one thing, and for another, it makes it harder to help you. In this case, clicking on your link *also* doesn't go to your notebook (although copying the URL string into the browser works). For the benefit of others who might use your question as a reference, consider moving your example data and setup into the text of your post, as a [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve). — andrew_reece, Dec 24 '17 at 20:21
@andrew_reece Thank you for the heads up on question and code etiquette. I'll be sure to use that going forward. — James Marsden, Dec 26 '17 at 02:42

score 0 · Answer 1 · answered Dec 24 '17 at 19:57

0

The way to use isin is to first call Series.isin(...) to produce a boolean mask and then index using this mask. Alternatively, to use your function on a list instead of a series, you can call groupFinder(df['item'].values).

answered Dec 24 '17 at 19:57

rvd

558
2
9

score 0 · Answer 2 · answered Dec 24 '17 at 22:51

IIUC, you can do what you want in just a few lines using Pandas:

import pandas as pd

# create master list of items
master = pd.Series(legumesGroup + herbGroup + radishGroup)

# assign group id as index
master.index = [0]*len(legumesGroup) + [1]*len(herbGroup) + [2]*len(radishGroup)

# sample from master with replacement to get itemList
itemList = master.sample(n=1000, replace=True)

Now to get the group each item in itemList is in, call either itemList to see the group ID plus item, or just itemList.index.

itemList.head()

Output:

2        Horseradish
2           Rutabaga
2             Turnip
0          Chickpeas
0        Pinto beans

This is a really interesting solution. I would not have thought about adding an index value to the group items. Thanks! — James Marsden, Dec 26 '17 at 02:40

score 0 · Answer 3 · answered Dec 26 '17 at 02:53

Solution

I came across some pandas blog posts and also got some feedback from a reddit user which gave me a solution that skips using iterrows by using pandas' apply function.

df['groupID2'] = df.item.apply(groupFinder)

Thank you everyone for your help and responses.

Pandas: Alternative to iterrow loops

3 Answers3

Linked