I'm getting this error when trying to run the code below. I read this whole thread but I'm still at a loss and can't figure where the error is coming from.
dataframe
element id id_2 group condition level vol size height
0 C11 28 CTL11 cont control l1 18687.3750 0.190136 0.682789
1 C11 28 CTL11 cont control l2 16797.3750 0.181322 0.829770
2 C11 28 CTL11 cont control l3 24813.0000 0.204907 0.812723
3 C12 29 CTL12 cont control l1 20069.4375 0.174686 0.719480
4 C12 29 CTL12 cont control l2 17323.8750 0.149539 0.836107
Code:
# copy original dataframe
ratios = df1
# aggregate data as a function of level. Add all other relevan varibles.
ratios = ratios.groupby(["element", "id", "id_2", "group", "condition", "level"],
as_index=False).mean()
# pivot table. Each level is one column
rt = pd.pivot_table(ratios, values=['vol', 'size', 'height'],
index=["element", "id", "id_2", "group", "condition"],
columns=['level']).reset_index()
After pivoting based on the level
column (l1, l2 and l3) I create new columns based on the division between certain columns (e.g. vol
or size
)
# calculations
rt['l1l3_vol'] = rt['vol']['l1'] / rt['vol']['l3']
rt['l1l3_size'] = rt['size']['l1'] / rt['size']['l3']
# run test
design = 'group ~ C(l1l3_size)'
model = ols(design, data=rt).fit()
anova_table = sm.stats.anova_lm(model, typ=3)
# multicomp
res = stat()
res.tukey_hsd(df=rt, res_var='l1l3_size', xfac_var='group',
anova_model=design)
The code runs until the res = stat()
line (inclusive) and only gives an error when running the res.tukey_hsd
line. I figure the error might be related with the division operations but I'm not sure how to overcome this issue.
Any idea is very welcome!
EDIT: Full error
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-f9bb6e4e7e5a> in <module>
42 res = stat()
43 res.tukey_hsd(df=rt, res_var='l1l3_size', xfac_var='group',
---> 44 anova_model=design)
45 # print stats outputs
46 print(anova_table)
/opt/anaconda3/envs/icvs/lib/python3.7/site-packages/bioinfokit/analys.py in tukey_hsd(self, df, res_var, xfac_var, anova_model, phalpha, ss_typ)
797 comp_pairs = [(ele1, ele2) for i, ele1 in enumerate(list(mult_group)) for ele2 in list(mult_group)[i + 1:]]
798 for p in comp_pairs:
--> 799 mean_diff = max(mult_group[p[0]], mult_group[p[1]]) - min(mult_group[p[0]], mult_group[p[1]])
800 # count for groups; this is useful when sample size not equal -- Tukey-Kramer
801 group1_count, group2_count = mult_group_count[p[0]], mult_group_count[p[1]]
/opt/anaconda3/envs/icvs/lib/python3.7/site-packages/pandas/core/generic.py in __nonzero__(self)
1441 def __nonzero__(self):
1442 raise ValueError(
-> 1443 f"The truth value of a {type(self).__name__} is ambiguous. "
1444 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1445 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().