0

I found this thread Ignore python multiple return value but I still don't understand how it would be possible to only obtain p-value from a t-test. I'm trying to express just the p-values in a vector and I'm really struggling. I tried the fun()[1]etc method but it returns an empty index.

My code looks like this:

# Statistical analysis: unpaired t-test
import scipy
from scipy import stats
# Sorting for dosage
drug_1_1 = table.iloc[:,1][(table['Dose drug 1'] == 1)]
drug_2_1 = table.iloc[:,3][(table['Dose drug 2'] == 1)]
drug_1_2 = table.iloc[:,1][(table['Dose drug 1'] == 2)]
drug_2_2 = table.iloc[:,3][(table['Dose drug 2'] == 2)]
drug_1_3 = table.iloc[:,1][(table['Dose drug 1'] == 3)]
drug_2_3 = table.iloc[:,3][(table['Dose drug 2'] == 3)]
drug_1_4 = table.iloc[:,1][(table['Dose drug 1'] == 4)]
drug_2_4 = table.iloc[:,3][(table['Dose drug 2'] == 4)]
drug_1_5 = table.iloc[:,1][(table['Dose drug 1'] == 5)]
drug_2_5 = table.iloc[:,3][(table['Dose drug 2'] == 5)]
drug_1_6 = table.iloc[:,1][(table['Dose drug 1'] == 6)]
drug_2_6 = table.iloc[:,3][(table['Dose drug 2'] == 6)]
drug_1_7 = table.iloc[:,1][(table['Dose drug 1'] == 7)]
drug_2_7 = table.iloc[:,3][(table['Dose drug 2'] == 7)]
drug_1_8 = table.iloc[:,1][(table['Dose drug 1'] == 8)]
drug_2_8 = table.iloc[:,3][(table['Dose drug 2'] == 8)]

# Expessing p-values in vector
P_values = pd.DataFrame()
P_values['1'] = stats.ttest_ind(drug_1_1,drug_2_1)[1]
P_values['2'] = stats.ttest_ind(drug_1_2,drug_2_2)[1]
P_values['3'] = stats.ttest_ind(drug_1_3,drug_2_3)[1]
P_values['4'] = stats.ttest_ind(drug_1_4,drug_2_4)[1]
P_values['5'] = stats.ttest_ind(drug_1_5,drug_2_5)[1]
P_values['6'] = stats.ttest_ind(drug_1_6,drug_2_6)[1]
P_values['7'] = stats.ttest_ind(drug_1_7,drug_2_7)[1]
P_values['8'] = stats.ttest_ind(drug_1_8,drug_2_8)[1]
P_values.index.names = ['Dose']
print(P_values)

Which returns:

Empty DataFrame
Columns: [1, 2, 3, 4, 5, 6, 7, 8]
Index: []

Which stragenly seems to work if it's not played in the first line, but it returns the same value for both 0 and 1 in all other lines like so: https://imgur.com/a/9H7YXL7

Am I writing something wrong?

  • The ttest returns a tuple of two values. If you want to retrieve only one of the two, you can index into the returned values: `stats.ttest_ind(drug_1_1,drug_2_1)[0]` is the statistic, `stats.ttest_ind(drug_1_1,drug_2_1)[1]` is the pvalue – G. Anderson Nov 27 '18 at 20:24
  • Hey! Thank you for responding. I've tried the method you propose with [1] but I receive this error Empty DataFrame Columns: [1, 2, 3, 4, 5, 6, 7, 8] Index: [] – Email for rat facts Nov 27 '18 at 20:34
  • Strangely enough this happens. [1] on the first row disables it. But everywhere else simply returns the [1] as both 0 and 1 https://imgur.com/a/9H7YXL7 – Email for rat facts Nov 27 '18 at 20:42
  • For one thing, If you want it in a vector, I wouldn't make it a dataframe. As a check, if you just call `print(stats.ttest_ind(drug_1_1,drug_2_1))`, what is returned? If that returns `(-0.074465, 0.940906)`, then add each `[1]` to an array instead of a df – G. Anderson Nov 27 '18 at 21:22
  • What would you make it then? How do I make an array? Yes, that's exactly what it returns! – Email for rat facts Nov 27 '18 at 21:26
  • See my answer below and let me know if it helps – G. Anderson Nov 27 '18 at 21:46

1 Answers1

0

So, based on what you've written so far, it looks like you're doing a lot of (possibly) unecessary hard-coding of values and storing of variables. If you need to re-use all of your drug_1_1 etc, then leave the upper portion of the code as is. If you don't need to use those later, then you should be able to loop over the range of values and do all of your lookups, ttesting, and pval storing at once as below: (Note, I couldn't test this, as I have no idea what your data looks like)

p_vals=[]
for i in range(1,9):
    p_vals.append(stats.ttest_ind(table.iloc[:,1][(table['Dose drug 1'] == i)],
                                  table.iloc[:,3][(table['Dose drug 2'] == i)])[1])
G. Anderson
  • 5,815
  • 2
  • 14
  • 21
  • This really fixed it. I wasn't even aware that you could do something like that. I'm really new to this, so I'm trying to understand what you did. Could you please explain one thing? How come the range is 1-9 instead of 1-8? – Email for rat facts Nov 27 '18 at 22:15
  • Glad I could help! Ranges in python are zero-indexed, and non-inclusive on the upper end. So if you run `[i for i in range(8)]` vs `[i for i in range(1,9)]`, you can see a list of the values contained in the respective ranges – G. Anderson Nov 27 '18 at 22:21
  • 1
    Ah, I see. So in this case 0 and 10 return nan, but setting range to 1-8 returns one outcome too short. This is because 1,8 generates values from 1 up to 7 (so not including 8, in which case 9 must be taken to include 8). Thank you for all the help! I went and modified previous parts of the code with the loop and it's working great! – Email for rat facts Nov 28 '18 at 10:36