I'm new to pandas dataframes and would appreciate help with the following problem (similar to this). I have the following data:
data = {'Cat1': [2,1,2,1,2,1,2,1,1,1,2],
'Cat2': [0,0,0,0,0,0,1,1,1,1,1],
'values': [1,2,3,1,2,3,1,2,3,5,1]}
my_data = DataFrame(data)
I would like to perform a ttest_ind
for every category in Cat2
to distinguish between categories in Cat1
.
The way I see it, I could separate the data into
cat1_1 = my_data[my_data['Cat1']==1]
cat1_2 = my_data[my_data['Cat1']==2]
And then loop through every value in Cat2
to perform a t-test:
for cat2 in [0,1]:
subset_1 = cat1_1[cat1_1['Cat2']==cat2]
subset_2 = cat1_2[cat1_2['Cat2']==cat2]
t, p = ttest_ind(subset_1['values'], subset_2['values'])
But this seems really convoluted. Could there be a simpler solution, maybe with groupby
? Thanks a lot!