0

I am trying to do a groupby in a Pandas dataframe (column Mode) followed by an apply that sums the values in column Sensor Glucose (mg/dL) for rows where the isInRange function returns True.

For some reason, I receive:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Yet if I replace the function with something more simple like x[x['Sensor Glucose (mg/dL)']>160]['Sensor Glucose (mg/dL)'].sum(), it works fine. Not sure why it isn't applying this function as expected.

DataFrame Info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55343 entries, 0 to 55342
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Date                    55343 non-null  object        
 1   Time                    55343 non-null  object        
 2   Sensor Glucose (mg/dL)  55343 non-null  float64       
 3   DateTime                55343 non-null  datetime64[ns]
 4   Mode                    55343 non-null  object        
 5   ResultRange             55343 non-null  object        
dtypes: datetime64[ns](1), float64(1), object(4)
memory usage: 2.5+ MB

isInRange function

def isInRange(cgmValue, rangeName):
    if cgmValue > 250:
        return rangeName == 'hyperglycemia-critical'
    elif cgmValue > 180:
        return rangeName == 'hyperglycemia'
    elif cgmValue >= 70:
        if (cgmValue <= 150):
            return rangeName == 'cgm70:180' or rangeName == 'cgm70:150'
        else:
            return rangeName == 'cgm70:180'
    elif cgmValue >= 54:
        return rangeName == 'hypoglycemia-level1'
    else:
        return rangeName == 'hypoglycemia-level2' 

GroupBy/Apply

Result = CGM.groupby('Mode').apply(lambda x: x[isInRange(x['Sensor Glucose (mg/dL)'],'hyperglycemia-critical') == True]['Sensor Glucose (mg/dL)'].sum())
yudhiesh
  • 6,383
  • 3
  • 16
  • 49
qscott86
  • 303
  • 3
  • 11
  • `.apply` sends the entire column as `x`. You can add a print statement to your function to test it for yourself. [`pandas.core.groupby.GroupBy.apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.GroupBy.apply.html): _Apply function func **group-wise** and combine the results together._ – Trenton McKinney Jan 19 '21 at 03:55
  • In the future, always provide a complete [mre] with code, **data, errors, current output, and expected output**, as **[formatted text](https://stackoverflow.com/help/formatting)**. If relevant, only plot images are okay. Please see [How to ask a good question](https://stackoverflow.com/help/how-to-ask). – Trenton McKinney Jan 19 '21 at 04:05

1 Answers1

-1

I tried a minimal example:

d = {'Mode': ['Red', 'Green', 'Blue', 'Red', 'Green', 'Blue'], 
     'Sensor Glucose (mg/dL)': [270, 190, 160, 140, 60, 40]}
CGM = pd.DataFrame(data=d)

The resulting dataframe is:

    Mode  Sensor Glucose (mg/dL)
0    Red                     270
1  Green                     190
2   Blue                     160
3    Red                     140
4  Green                      60
5   Blue                      40

The first row is:

CGM.loc[[0]]

giving us the output of

  Mode  Sensor Glucose (mg/dL)
0  Red                     270

However, if you call

CGM.loc[[0]]['Sensor Glucose (mg/dL)']

what you get is

0    270
Name: Sensor Glucose (mg/dL), dtype: int64

which is actually a series.

If you put this series into the function:

isInRange(CGM.loc[[0]]['Sensor Glucose (mg/dL)'],'hyperglycemia-critical')

it will return an error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-100-f4a597304967> in <module>
----> 1 isInRange(CGM.loc[[0]]['Sensor Glucose (mg/dL)'],'hyperglycemia-critical')

<ipython-input-50-c213dcae68c1> in isInRange(cgmValue, rangeName)
      1 def isInRange(cgmValue, rangeName):
----> 2     if cgmValue > 250:
      3         return rangeName == 'hyperglycemia-critical'
      4     elif cgmValue > 180:
      5         return rangeName == 'hyperglycemia'

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1438     def __nonzero__(self):
   1439         raise ValueError(
-> 1440             f"The truth value of a {type(self).__name__} is ambiguous. "
   1441             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1442         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Instead, you should feed a number like this:

CGM.loc[[0]].iloc[0]['Sensor Glucose (mg/dL)']

and you'll get the output:

True
Life is Good
  • 106
  • 9