The truth value of a Series is ambiguous... Error when running tuckey_hsd from bioinfokit.analys

Question

I'm getting this error when trying to run the code below. I read this whole thread but I'm still at a loss and can't figure where the error is coming from.

dataframe

    element id  id_2    group   condition   level   vol size    height
0   C11 28  CTL11   cont    control l1  18687.3750  0.190136    0.682789
1   C11 28  CTL11   cont    control l2  16797.3750  0.181322    0.829770
2   C11 28  CTL11   cont    control l3  24813.0000  0.204907    0.812723
3   C12 29  CTL12   cont    control l1  20069.4375  0.174686    0.719480
4   C12 29  CTL12   cont    control l2  17323.8750  0.149539    0.836107

Code:

# copy original dataframe
ratios = df1
# aggregate data as a function of level. Add all other relevan varibles.
ratios = ratios.groupby(["element", "id", "id_2", "group", "condition", "level"], 
as_index=False).mean()
# pivot table. Each level is one column
rt = pd.pivot_table(ratios, values=['vol', 'size', 'height'], 
                        index=["element", "id", "id_2", "group", "condition"], 
                        columns=['level']).reset_index()

After pivoting based on the level column (l1, l2 and l3) I create new columns based on the division between certain columns (e.g. vol or size)

# calculations
rt['l1l3_vol'] = rt['vol']['l1'] / rt['vol']['l3'] 
rt['l1l3_size'] = rt['size']['l1'] / rt['size']['l3']

# run test
design = 'group ~ C(l1l3_size)'
model = ols(design, data=rt).fit()
anova_table = sm.stats.anova_lm(model, typ=3)
# multicomp
res = stat()
res.tukey_hsd(df=rt, res_var='l1l3_size', xfac_var='group', 
              anova_model=design)

The code runs until the res = stat() line (inclusive) and only gives an error when running the res.tukey_hsd line. I figure the error might be related with the division operations but I'm not sure how to overcome this issue.

Any idea is very welcome!

EDIT: Full error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-f9bb6e4e7e5a> in <module>
     42 res = stat()
     43 res.tukey_hsd(df=rt, res_var='l1l3_size', xfac_var='group', 
---> 44               anova_model=design)
     45 # print stats outputs
     46 print(anova_table)

/opt/anaconda3/envs/icvs/lib/python3.7/site-packages/bioinfokit/analys.py in tukey_hsd(self, df, res_var, xfac_var, anova_model, phalpha, ss_typ)
    797         comp_pairs = [(ele1, ele2) for i, ele1 in enumerate(list(mult_group)) for ele2 in list(mult_group)[i + 1:]]
    798         for p in comp_pairs:
--> 799             mean_diff = max(mult_group[p[0]], mult_group[p[1]]) - min(mult_group[p[0]], mult_group[p[1]])
    800             # count for groups; this is useful when sample size not equal -- Tukey-Kramer
    801             group1_count, group2_count = mult_group_count[p[0]], mult_group_count[p[1]]

/opt/anaconda3/envs/icvs/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
   1441     def __nonzero__(self):
   1442         raise ValueError(
-> 1443             f"The truth value of a {type(self).__name__} is ambiguous. "
   1444             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1445         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

As an aside `ratios = df1` does not _copy_ your original dataframe. — Henry Ecker, Jun 11 '21 at 15:19
@HenryEcker thanks! I always assumed it would copy the df but apparently you need to use: new_df = df.copy() https://stackoverflow.com/questions/27673231/why-should-i-make-a-copy-of-a-data-frame-in-pandas — Oiko, Jun 11 '21 at 15:34

score 1 · Answer 1 · answered Jun 11 '21 at 15:49

The problem is that you are you are passing a pandas.core.series.Series object to max(), see below

In [1]: import pandas as pd                                                                       

In [2]: max(pd.Series([1,2,3]), pd.Series([3,5,7]))                                               
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-d0510caa4f61> in <module>
----> 1 max(pd.Series([1,2,3]), pd.Series([3,5,7]))

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py in __nonzero__(self)
   1476 
   1477     def __nonzero__(self):
-> 1478         raise ValueError(
   1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In line 799 you pass mult_group[p[0]] to max() which comes from mult_group, mult_group_count, sample_size_r = analys_general.get_list_from_df(df, xfac_var, res_var, 'get_dict') where df is your rt data frame. Check out the source code, may from the get_list_from_df code you can figure out if your rt is as expected.

score 0 · Answer 2 · answered Jun 11 '21 at 16:35

The issue was with the hierarchical indexing of the columns after the pivoting operation. After pivoting:

     element id  id_2  group  condition   vol       size      height
level                                     l1 l2 l3  l1 l2 l3  l1 l2 l3
.....

Flattening the index solved the issue. I used code from this answer

rt.columns = [' '.join(col).strip() for col in rt.columns.values]

Resulting dataframe:

 element id  id_2  group  condition   vol l1   vol l2   vol l3   size l1   size l2   etc.

The truth value of a Series is ambiguous... Error when running tuckey_hsd from bioinfokit.analys

2 Answers2