1

I have a large series of data frames in R on which I want to perform some actions all at once in a for loop.

The data frames contain information on gene expression data. For each gene, there is information on upregulation/downregulation and an associated P-value. Ultimately, I want to obtain a new data frame containing the number of significantly (P value < 0.05) up- and downregulated genes for each data frame.

I am going about this in two steps:

  1. subset the data frames in subsets containing only up- and downregulated genes
  2. calculate the number of significant genes in each subsetted data frame

First, let's make two dummy data frames:

#data frame 1
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('up','up','down','down','down','up')
Pvalue = as.numeric(c(0.05,0.06,0.001,0.075,0.11,0.12))
df1 = as.data.frame(cbind(gene,direction,Pvalue)) 
> df1 
   gene direction Pvalue
1 gene1        up   0.05
2 gene2        up   0.06
3 gene3      down  0.001
4 gene4      down  0.075
5 gene5      down   0.11
6 gene6        up   0.12
#data frame 2
gene = c('gene1','gene2','gene3','gene4','gene5','gene6')
direction = c('down','up','down','down','up','up')
Pvalue = as.numeric(c(0.043,0.001,0.34,0.96,0.001,0.04))
df2 = as.data.frame(cbind(gene,direction,Pvalue)) 
> df2
   gene direction Pvalue
1 gene1      down  0.043
2 gene2        up  0.001
3 gene3      down   0.34
4 gene4      down   0.96
5 gene5        up  0.001
6 gene6        up   0.04

Then, I made a list containing the names of all data frames:

df_summary = c('df1','df2')

After which I use a for loop over this list to do steps 1 and 2 outlined above:

df3 = data.frame()
for (df in df_summary){
  df_down = df[df$direction == 'down',]
  df_up = df[df$direction == 'up',]
  df_down_sign = length(which(df_down$Pvalue < 0.05))
  df_up_sign = length(which(df_up$Pvalue < 0.05))
  df3 = rbind.data.frame(df3, c(df_down_sign,df_up_sign))
}

This code works perfectly fine on individual data frames outside the loop, but throws me the following error when I run the loop:

Error: $ operator is invalid for atomic vectors

The output I am looking for should look like this:

  dataframe number
1       df1      1
2       df1      0
3       df2      1
4       df2      3

So my question: why am I getting this error in the for loop, and how to solve it?

enileve
  • 180
  • 1
  • 11

2 Answers2

2

The following solves the problem.

df_list <- mget(ls(pattern = "^df"))

df3 <- lapply(seq_along(df_list), function(i){
  dftmp <- df_list[[i]]
  dfname <- names(df_list)[i]
  agg <- aggregate(Pvalue ~ direction, dftmp, function(x) sum(x < 0.05))
  cbind.data.frame(dataframe = dfname, agg)
})
df3 <- do.call(rbind, df3)

df3
#  dataframe direction Pvalue
#1       df1      down      1
#2       df1        up      0
#3       df2      down      1
#4       df2        up      3
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Turns out that after posting my question I came across something that looks like a solution.

Simply running

df_summary = list(df1,df2)

instead of

df_summary = c('df1','df2')

seems to solve my problem!

enileve
  • 180
  • 1
  • 11
  • I found this answer here [Running for loop for multiple dataframes in R?](https://stackoverflow.com/questions/62240627/running-for-loop-for-multiple-dataframes-in-r?rq=1) – enileve Apr 09 '21 at 17:17