2

I have a df that looks like this

           Ens_prot_ID      Ens_gene_ID Sample     TPM      ppm   ppm/TPM
0      ENSP00000416240  ENSG00000109072  liver  2540.4    0.003  0.000001
21597  ENSP00000226218  ENSG00000109072  liver  2540.4  110.000  0.043300
...

The code below isn't working and gives "KeyError: 0"

from scipy import stats
proteins=df['Ens_prot_ID'].unique()
stats.f_oneway([df[df['Ens_prot_ID'] == prot]['ppm/TPM'] for prot in proteins])

I don't have any issue running

from scipy import stats
proteins=df['Ens_prot_ID'].unique()
for prot in proteins:
    df[df['Ens_prot_ID'] == prot]['ppm/TPM']

So it's seems like the issue is f_oneway() not liking my form of input. Is there any way to get the function to accept group names that aren't typed in by hand?

1 Answers1

2

You can do it this way:

stats.f_oneway(*(df[df['Ens_prot_ID'] == prot]['ppm/TPM'] for prot in proteins))

The * allows you to pass multiple arguments to a function: https://pythontips.com/2013/08/04/args-and-kwargs-in-python-explained/

I also changed the list comprehension to a generator comprehension in case your DataFrame is very large.

Community
  • 1
  • 1
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223