I have a large series of data frames in R on which I want to perform some actions all at once in a for loop.
The data frames contain information on gene expression data. For each gene, there is information on upregulation/downregulation and an associated P-value. Ultimately, I want to obtain a new data frame containing the number of significantly (P value < 0.05) up- and downregulated genes for each data frame.
I am going about this in two steps:
- subset the data frames in subsets containing only up- and downregulated genes
- calculate the number of significant genes in each subsetted data frame
First, let's make two dummy data frames:
#data frame 1
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('up','up','down','down','down','up')
Pvalue = as.numeric(c(0.05,0.06,0.001,0.075,0.11,0.12))
df1 = as.data.frame(cbind(gene,direction,Pvalue))
> df1 gene direction Pvalue 1 gene1 up 0.05 2 gene2 up 0.06 3 gene3 down 0.001 4 gene4 down 0.075 5 gene5 down 0.11 6 gene6 up 0.12
#data frame 2
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('down','up','down','down','up','up')
Pvalue = as.numeric(c(0.043,0.001,0.34,0.96,0.001,0.04))
df2 = as.data.frame(cbind(gene,direction,Pvalue))
> df2 gene direction Pvalue 1 gene1 down 0.043 2 gene2 up 0.001 3 gene3 down 0.34 4 gene4 down 0.96 5 gene5 up 0.001 6 gene6 up 0.04
Then, I made a list containing the names of all data frames:
df_summary = c('df1','df2')
After which I use a for loop over this list to do steps 1 and 2 outlined above:
df3 = data.frame()
for (df in df_summary){
df_down = df[df$direction == 'down',]
df_up = df[df$direction == 'up',]
df_down_sign = length(which(df_down$Pvalue < 0.05))
df_up_sign = length(which(df_up$Pvalue < 0.05))
df3 = rbind.data.frame(df3, c(df_down_sign,df_up_sign))
}
This code works perfectly fine on individual data frames outside the loop, but throws me the following error when I run the loop:
Error: $ operator is invalid for atomic vectors
The output I am looking for should look like this:
dataframe number 1 df1 1 2 df1 0 3 df2 1 4 df2 3
So my question: why am I getting this error in the for loop, and how to solve it?