For loops for running a function on columns

Question

I would like to run a test for each species of ant that I have (columns 9-13) that compares values in the following columns: Attack_count, Attack_percent, Survival_count, and Survival_percent. I want to make this comparison when individual ant species are ants>0 to when they are ant==0.

This is what I have tried, and it returns pvalue=NA multiple times. This is my first attempt at writing a for loop and I do not know how to incorporate i.

I added in the bootstrap code. I am not exactly sure how making a reproducible example goes but I wrote one in

bootstrap_ttest <- function(data1,data2,resamples){

    delta_real <- mean(data1) - mean(data2) ##real diff btwn means
    pooled_data <- c(data1, data2)

    null_differences <-c()
    for(x in 1:1000){

        data1_null <- sample(pooled_data,size=length(data1), replace=T)
        data2_null <- sample(pooled_data,size=length(data2), replace=T)

        delta_null <- mean(data1_null) - mean(data2_null)
        null_differences <- c(null_differences, delta_null )

    }## end of resampling loop 

    pvalue <- sum(abs(null_differences) > abs(delta_real))/length(null_differences)
    cat("pvalue:", pvalue)
    assign("pvalue", pvalue,.GlobalEnv) 
    assign("null_dist", null_differences,.GlobalEnv )
    assign("delta_obs", delta_real,.GlobalEnv )

}
ac_pvals = vector(length = ncol(ants))
ap_pvals = vector(length = ncol(ants))
sc_pvals = vector(length = ncol(ants))
sp_pvals = vector(length = ncol(ants))

for(i in 1:ncol(ants)){

  ants = data.frame(mainbroca[,9:13])

test1 = bootstrap_ttest(data1=mainbroca$Attack_count[ants == 0], 
                data2=mainbroca$Attack_count[ants>0], resamples=1000)
test2 = bootstrap_ttest(data1=mainbroca$Attack_percent[ants == 0], 
                data2=mainbroca$Attack_percent[ants>0], resamples=1000)
test3 = bootstrap_ttest(data1=mainbroca$Survival_count[ants == 0], 
                data2=mainbroca$Survival_count[ants>0], resamples=1000)
test4 = bootstrap_ttest(data1=mainbroca$Survival_percent[ants == 0], 
                data2=mainbroca$Survival_percent[ants>0], resamples=1000)

ac_pvals[1] = c(test1)
ap_pvals[1] = c(test2)
sc_pvals[1] = c(test3)
sp_pvals[1] = c(test4)

}

#reproducible

fakerow1 <- c(1,2,3,4,100,80,60,40,20)
fakerow2 <- c(1,2,3,4,100,80,60,40,20)
fakedata = rbind(fakerow1,fakerow2)
colnames(fakedata) = c('ac','ap','sc','sp','ant1','ant2','ant3','ant4','ant5')

@Jannice Newson, from which packages the function `bootstrap_ttest` is coming from ? I'm not able to find it online. Also, instead of attached images, can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your data, it will facilitate copy/paste. — dc37, Nov 19 '19 at 17:32

D. Stevens · Answer 1 · 2019-11-20T12:50:01.760

I have modified your code to do what I think you want. You were looping over the columns and needed to filter the ants dataset by column and then save the results to your vectors without overwriting.

ants = data.frame(mainbroca[,9:13])

ac_pvals = vector()
ap_pvals = vector()
sc_pvals = vector()
sp_pvals = vector()

for(i in 1:ncol(ants)){

bootstrap_ttest(data1=mainbroca$Attack_count[ants[,i] == 0], 
                data2=longbroca$Attack_count[ants[,i]>0], resamples=1000)
ac_pvals = c(ac_pvals, pvalue)
bootstrap_ttest(data1=mainbroca$Attack_percent[ants[,i] == 0], 
                data2=longbroca$Attack_percent[ants[,i]>0], resamples=1000)
ap_pvals = c(ap_pvals, pvalue)
bootstrap_ttest(data1=mainbroca$Survival_count[ants[,i] == 0], 
                data2=longbroca$Survival_count[ants[,i]>0], resamples=1000)
sc_pvals = c(sc_pvals, pvalue)
bootstrap_ttest(data1=mainbroca$Survival_percent[ants[,i] == 0], 
                data2=longbroca$Survival_percent[ants[,i]>0], resamples=1000)
sp_pvals = c(sp_pvals, pvalue)
}

This worked, but the vectors to save pvalues do not have the pvalues in them. The pvalues print out after running the loop, but when I view the vectors, there are different numbers within them — Jannice Newson, Nov 19 '19 at 17:01
I must admit I assumed that bootstrap_ttest output the p-value. I have updated my code to fit your function. Your fake data looks quite different from what your code implied. There are a different number of columns and different variable names. You will also probably need more rows to bootstrap. Also, your bootstrap code seems to be hardcoded to run 1000 time rather than use resamples. — D. Stevens, Nov 20 '19 at 12:54

dc37 · Answer 2 · 2019-11-20T04:10:17.903

Thanks for updating your question and providing informations about your bootstrap_ttest function.

1) There is a little mistake on your function. the resample variable is not used, I think you should replace for(x in 1:1000) by for(x in 1:resample).

2) Here, a possible way to simplify your code (maybe not the best one) by defining two for loop. The first one will loop across columns of ants and the second will repeat the bootstrap_ttest function to each column selected. Then, you can bind pvalues and delta_obs in two dataframes that you can ultimately bind.

Here is the code to define variables to loop:

# Defining columns to be tested 
colonne = c("Attack_counts","Attack_percent","Survival_count","Survival_percent")
# Defining ants dataframe
ants = mainbroca[,c(9:13)]

And here is the double loop:

result_pval = NULL
result_t_boot = NULL

for(j in 1:ncol(ants))
{
  P.val =NULL
  t_bootstrap = NULL
  for(i in 1:length(colonne))
  {
    t <- bootstrap_ttest(data1 = mainbroca[ants[,j]==0,colonne[i]]), data2 = mainbroca[ants[,j]>0,colonne[i]], resamples = 1000]
    P.val = c(P.val,pvalue)
    t_bootstrap = c(t_bootstrap,t)
  }
  result_pval = cbind(result_pval,P.val)
  result_t_boot = cbind(result_t_boot,t_bootstrap)
}

And to get the final dataframe:

colnames(result_pval) = paste0(colnames(ants),"_pval")
colnames(result_t_boot) = paste0(colnames(ants),"_DeltaObs")
Final_df = cbind(result_pval,result_t_boot)
rownames(Final_df) = colonne

At the end, you should get a dataframe of 8 columns and 4 rows. First 4 columns, you will get pvalue from your bootstrap_ttest function for each ants column and the last 4 column will contain delta_obs calculated for each ants column. Rows will be each columns tested (Attacks, Survival,...)

Does it looks what you are looking for ?

This returns ```argument is not numeric or logical: returning NA``` — Jannice Newson, Nov 20 '19 at 05:01
Do you have NA in your data.frame ? Can you provide the output of `str(mainbroca)` ? — dc37, Nov 20 '19 at 05:03

score 0 · Answer 3 · answered Nov 20 '19 at 23:21

I ended up doing this.

ants = data.frame(mainbroca[,9:13])
cols_table = c('Attack_count', 'Attack_percent','Survival_count', 'Survival_percent')

ac_pvals = NULL
ap_pvals = NULL
sc_pvals = NULL
sp_pvals = NULL

#attack count
for(i in 1:ncol(ants)){

bootstrap_ttest(data1=mainbroca$Attack_count[ants[,i] == 0], 
                data2=mainbroca$Attack_count[ants[,i]>0], resamples=1000)
  ac_pvals = c(ac_pvals, pvalue)
}

#attack percent
for(i in 1:ncol(ants)){
bootstrap_ttest(data1=mainbroca$Attack_percent[ants[,i] == 0], 
                data2=mainbroca$Attack_percent[ants[,i]>0], resamples=1000)
   ap_pvals = c(ap_pvals, pvalue)
}

#survival count
for(i in 1:ncol(ants)){
bootstrap_ttest(data1=mainbroca$Survival_count[ants[,i] == 0], 
                data2=mainbroca$Survival_count[ants[,i]>0], resamples=1000)
   sc_pvals = c(sc_pvals, pvalue)
}

#survival percent
for(i in 1:ncol(ants)){
bootstrap_ttest(data1=mainbroca$Survival_percent[ants[,i] == 0], 
                data2=mainbroca$Survival_percent[ants[,i]>0], resamples=1000)
  sp_pvals = c(sp_pvals, pvalue)
}

ttest1 = print(ac_pvals)
ttest2 = print(ap_pvals)
ttest3 = print(sc_pvals)
ttest4 = print(sp_pvals)

ptable = cbind(ac_pvals, ap_pvals,sc_pvals,sp_pvals)
rownames(ptable) = c("Was", "Pmor", "Monom", "Brachy", "Sinv")

For loops for running a function on columns

3 Answers3