0

I would like to run a test for each species of ant that I have (columns 9-13) that compares values in the following columns: Attack_count, Attack_percent, Survival_count, and Survival_percent. I want to make this comparison when individual ant species are ants>0 to when they are ant==0.

This is what I have tried, and it returns pvalue=NA multiple times. This is my first attempt at writing a for loop and I do not know how to incorporate i.

I added in the bootstrap code. I am not exactly sure how making a reproducible example goes but I wrote one in

bootstrap_ttest <- function(data1,data2,resamples){

    delta_real <- mean(data1) - mean(data2) ##real diff btwn means
    pooled_data <- c(data1, data2)

    null_differences <-c()
    for(x in 1:1000){

        data1_null <- sample(pooled_data,size=length(data1), replace=T)
        data2_null <- sample(pooled_data,size=length(data2), replace=T)

        delta_null <- mean(data1_null) - mean(data2_null)
        null_differences <- c(null_differences, delta_null )

    }## end of resampling loop 

    pvalue <- sum(abs(null_differences) > abs(delta_real))/length(null_differences)
    cat("pvalue:", pvalue)
    assign("pvalue", pvalue,.GlobalEnv) 
    assign("null_dist", null_differences,.GlobalEnv )
    assign("delta_obs", delta_real,.GlobalEnv )

}
ac_pvals = vector(length = ncol(ants))
ap_pvals = vector(length = ncol(ants))
sc_pvals = vector(length = ncol(ants))
sp_pvals = vector(length = ncol(ants))

for(i in 1:ncol(ants)){

  ants = data.frame(mainbroca[,9:13])

test1 = bootstrap_ttest(data1=mainbroca$Attack_count[ants == 0], 
                data2=mainbroca$Attack_count[ants>0], resamples=1000)
test2 = bootstrap_ttest(data1=mainbroca$Attack_percent[ants == 0], 
                data2=mainbroca$Attack_percent[ants>0], resamples=1000)
test3 = bootstrap_ttest(data1=mainbroca$Survival_count[ants == 0], 
                data2=mainbroca$Survival_count[ants>0], resamples=1000)
test4 = bootstrap_ttest(data1=mainbroca$Survival_percent[ants == 0], 
                data2=mainbroca$Survival_percent[ants>0], resamples=1000)

ac_pvals[1] = c(test1)
ap_pvals[1] = c(test2)
sc_pvals[1] = c(test3)
sp_pvals[1] = c(test4)

}

#reproducible

fakerow1 <- c(1,2,3,4,100,80,60,40,20)
fakerow2 <- c(1,2,3,4,100,80,60,40,20)
fakedata = rbind(fakerow1,fakerow2)
colnames(fakedata) = c('ac','ap','sc','sp','ant1','ant2','ant3','ant4','ant5')
Gautam
  • 2,597
  • 1
  • 28
  • 51
  • 1
    Can you provide a sample of the data you used? Thanks. – Taher A. Ghaleb Nov 19 '19 at 16:03
  • @Jannice Newson, from which packages the function `bootstrap_ttest` is coming from ? I'm not able to find it online. Also, instead of attached images, can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your data, it will facilitate copy/paste. – dc37 Nov 19 '19 at 17:32
  • Thanks for the update, I update my answer too accordingly – dc37 Nov 20 '19 at 04:11

3 Answers3

0

I have modified your code to do what I think you want. You were looping over the columns and needed to filter the ants dataset by column and then save the results to your vectors without overwriting.

ants = data.frame(mainbroca[,9:13])

ac_pvals = vector()
ap_pvals = vector()
sc_pvals = vector()
sp_pvals = vector()

for(i in 1:ncol(ants)){

bootstrap_ttest(data1=mainbroca$Attack_count[ants[,i] == 0], 
                data2=longbroca$Attack_count[ants[,i]>0], resamples=1000)
ac_pvals = c(ac_pvals, pvalue)
bootstrap_ttest(data1=mainbroca$Attack_percent[ants[,i] == 0], 
                data2=longbroca$Attack_percent[ants[,i]>0], resamples=1000)
ap_pvals = c(ap_pvals, pvalue)
bootstrap_ttest(data1=mainbroca$Survival_count[ants[,i] == 0], 
                data2=longbroca$Survival_count[ants[,i]>0], resamples=1000)
sc_pvals = c(sc_pvals, pvalue)
bootstrap_ttest(data1=mainbroca$Survival_percent[ants[,i] == 0], 
                data2=longbroca$Survival_percent[ants[,i]>0], resamples=1000)
sp_pvals = c(sp_pvals, pvalue)
}
D. Stevens
  • 73
  • 4
  • This worked, but the vectors to save pvalues do not have the pvalues in them. The pvalues print out after running the loop, but when I view the vectors, there are different numbers within them – Jannice Newson Nov 19 '19 at 17:01
  • I must admit I assumed that bootstrap_ttest output the p-value. I have updated my code to fit your function. Your fake data looks quite different from what your code implied. There are a different number of columns and different variable names. You will also probably need more rows to bootstrap. Also, your bootstrap code seems to be hardcoded to run 1000 time rather than use resamples. – D. Stevens Nov 20 '19 at 12:54
0

Thanks for updating your question and providing informations about your bootstrap_ttest function.

1) There is a little mistake on your function. the resample variable is not used, I think you should replace for(x in 1:1000) by for(x in 1:resample).

2) Here, a possible way to simplify your code (maybe not the best one) by defining two for loop. The first one will loop across columns of ants and the second will repeat the bootstrap_ttest function to each column selected. Then, you can bind pvalues and delta_obs in two dataframes that you can ultimately bind.

Here is the code to define variables to loop:

# Defining columns to be tested 
colonne = c("Attack_counts","Attack_percent","Survival_count","Survival_percent")
# Defining ants dataframe
ants = mainbroca[,c(9:13)]

And here is the double loop:

result_pval = NULL
result_t_boot = NULL

for(j in 1:ncol(ants))
{
  P.val =NULL
  t_bootstrap = NULL
  for(i in 1:length(colonne))
  {
    t <- bootstrap_ttest(data1 = mainbroca[ants[,j]==0,colonne[i]]), data2 = mainbroca[ants[,j]>0,colonne[i]], resamples = 1000]
    P.val = c(P.val,pvalue)
    t_bootstrap = c(t_bootstrap,t)
  }
  result_pval = cbind(result_pval,P.val)
  result_t_boot = cbind(result_t_boot,t_bootstrap)
}

And to get the final dataframe:

colnames(result_pval) = paste0(colnames(ants),"_pval")
colnames(result_t_boot) = paste0(colnames(ants),"_DeltaObs")
Final_df = cbind(result_pval,result_t_boot)
rownames(Final_df) = colonne

At the end, you should get a dataframe of 8 columns and 4 rows. First 4 columns, you will get pvalue from your bootstrap_ttest function for each ants column and the last 4 column will contain delta_obs calculated for each ants column. Rows will be each columns tested (Attacks, Survival,...)

Does it looks what you are looking for ?

dc37
  • 15,840
  • 4
  • 15
  • 32
0

I ended up doing this.

ants = data.frame(mainbroca[,9:13])
cols_table = c('Attack_count', 'Attack_percent','Survival_count', 'Survival_percent')

ac_pvals = NULL
ap_pvals = NULL
sc_pvals = NULL
sp_pvals = NULL

#attack count
for(i in 1:ncol(ants)){

bootstrap_ttest(data1=mainbroca$Attack_count[ants[,i] == 0], 
                data2=mainbroca$Attack_count[ants[,i]>0], resamples=1000)
  ac_pvals = c(ac_pvals, pvalue)
}

#attack percent
for(i in 1:ncol(ants)){
bootstrap_ttest(data1=mainbroca$Attack_percent[ants[,i] == 0], 
                data2=mainbroca$Attack_percent[ants[,i]>0], resamples=1000)
   ap_pvals = c(ap_pvals, pvalue)
}

#survival count
for(i in 1:ncol(ants)){
bootstrap_ttest(data1=mainbroca$Survival_count[ants[,i] == 0], 
                data2=mainbroca$Survival_count[ants[,i]>0], resamples=1000)
   sc_pvals = c(sc_pvals, pvalue)
}

#survival percent
for(i in 1:ncol(ants)){
bootstrap_ttest(data1=mainbroca$Survival_percent[ants[,i] == 0], 
                data2=mainbroca$Survival_percent[ants[,i]>0], resamples=1000)
  sp_pvals = c(sp_pvals, pvalue)
}

ttest1 = print(ac_pvals)
ttest2 = print(ap_pvals)
ttest3 = print(sc_pvals)
ttest4 = print(sp_pvals)

ptable = cbind(ac_pvals, ap_pvals,sc_pvals,sp_pvals)
rownames(ptable) = c("Was", "Pmor", "Monom", "Brachy", "Sinv")