0

I am working in R with 10 lists (files1, files2, files3, ... files10). Each list contains multiple dataframes.

Now, I want to extract some values from each dataframe in each list.

I was going to use a for loop

nt = c("A", "C", "G", "T")
for (i in files1) {
    for (j in nt) {
        name = paste(j, i, sep = "-") # here I want as output name = "files1-A". However this doesn't work. How can I get the name of the list "files1"?
        colname = paste("percentage", j, sep = "") # here I was as output colname = percentageA. This works
        assign(name, unlist(lapply(i, function(x) x[here I want to use the column with the name "percentageA", so 'colname'][x$position==1000])))
    }
}

So, I have troubles using names of lists and assigning them to variables.

I know only loop through the first list, but is it also possible to immediately loop through all my lists?

In other words: how can I put the code below in a for loop?

A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==1000]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==1000]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==1000]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==1000]))

A_files2 = unlist(lapply(files2, function(x) x$percentageA[x$position==1000]))
C_files2 = unlist(lapply(files2, function(x) x$percentageC[x$position==1000]))
G_files2 = unlist(lapply(files2, function(x) x$percentageG[x$position==1000]))
T_files2 = unlist(lapply(files2, function(x) x$percentageT[x$position==1000]))

....

A_files10 = unlist(lapply(files10, function(x) x$percentageA[x$position==1000]))
C_files10 = unlist(lapply(files10, function(x) x$percentageC[x$position==1000]))
G_files10 = unlist(lapply(files10, function(x) x$percentageG[x$position==1000]))
T_files10 = unlist(lapply(files10, function(x) x$percentageT[x$position==1000]))
user1987607
  • 2,057
  • 6
  • 26
  • 53
  • does `names(fileS1)` return `NULL`? – joel.wilson Dec 29 '16 at 11:09
  • @joel.wilson: yes it does – user1987607 Dec 29 '16 at 12:30
  • It would be great to post sample data, for example 2-3 files to have working example. See [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451#5965451). In general for reading multiple files I create a function(variable1,variable2) that returns a data frame from a single file. Then I use the `dplyr` package with `group_by(variable1, variable2)` `do(myfunction(.$variable1,.$variable2))` to read multiple files. this is great to to get all data in a single data frame. – Paul Rougieux Dec 29 '16 at 12:38
  • @PaulRougieux: I've added more explanations. Is it more clear now? – user1987607 Dec 29 '16 at 14:00
  • You still didn't provide sample data for file1, file2, ... . It's great to know what you tried but without clear input and clear output, it's hard to give you an answer that works. Just give sample input data from one or 2 files and the desired output. – Paul Rougieux Dec 29 '16 at 14:37
  • See also http://stackoverflow.com/a/9950217/2641825 and http://stackoverflow.com/a/18434780/2641825 – Paul Rougieux Dec 29 '16 at 15:43

2 Answers2

0

In order to answer to your question I create a fake list containing dataframes:

n = data.frame(andrea=c(1983, 11, 8),paja=c(1985, 4, 3)) 
s = data.frame(col1=c("aa", "bb", "cc", "dd", "ee")) 
b = data.frame(col1=c(TRUE, FALSE, TRUE, FALSE, FALSE)) 
x = list(n, s, b, 3)   # x contains copies of n, s, b
names(x) <- c("dataframe1","dataframe2","dataframe3","dataframe4")
files1 = x

Now, entering in what happens in your loop:

i = files1
j = "A"

If you want the names of your dataframes with the pedix contained in nt (in this case it is nt = "A") you have to use names(i):

name_wrong = paste(j, i, sep = "-") 
name       = paste(names(i),j,sep = "-")

So you obtain:

> name
[1] "dataframe1-A" "dataframe2-A" "dataframe3-A" "dataframe4-A"

I hope it is what you need.

Andrea Ianni
  • 829
  • 12
  • 24
  • This is not exactly what I want. I don't want to name all my dataframes, I just want to use the name of my lists. – user1987607 Dec 29 '16 at 12:33
  • 1
    How about you put all your lists in a list: `biglist <- list(files1 = files1)` `names(biglist)` will return you `[1] "files1"`. – Paul Rougieux Dec 29 '16 at 12:55
0

I think this data would be easier to manipulate if you flatten the data structure. Instead of 10 lists of data frames, you could use one single data frame with all observations indexed by their names and file names.

Generate sample data and use code from the question

Simplified data with only 10 or 11 points per item I suppose items in the list have a different number of lines?

files1 <- list(item1 = data.frame(position = 1:10,
                                  percentageA = 1:10/10,
                                  percentageC = 1:10/10,
                                  percentageG = 1:10/10,
                                  percentageT = 1:10/10),
               item2 = data.frame(position = 1:11,
                                  percentageA = 1:11/20,
                                  percentageC = 1:11/20,
                                  percentageG = 1:11/20,
                                  percentageT = 1:11/20))
str(file)

# Select the 9th position using your code
A_files1 = unlist(lapply(files1, function(x) x$percentageA[x$position==9]))
C_files1 = unlist(lapply(files1, function(x) x$percentageC[x$position==9]))
G_files1 = unlist(lapply(files1, function(x) x$percentageG[x$position==9]))
T_files1 = unlist(lapply(files1, function(x) x$percentageT[x$position==9]))

Flatten the list of dataframes into one dataframe

# Add name to each data frame
# Inspired by this answer
# http://stackoverflow.com/a/18434780/2641825


# For information l[1] creates a single list item
# l[[1]] extracts the data frame from the list
#' @param i index
#' @param listoffiles list of data frames
addname <- function(i, listoffiles){
     dtf <- listoffiles[[i]] # Extract the dataframe from the list
     dtf$name <- names(listoffiles[i]) # Add the name inside the data frame
     return(dtf)
}
# Add the name inside each data frame
files1 <- lapply(seq_along(files1), addname, files1)
str(files1) # look at the structure of the list
files1table <-  Reduce(rbind,files1) 

# Get the values of interest with
files1table$percentageA[files1table$position == 9]
# [1] 0.90 0.45

# Get all Letters of interest with
subset(files1table,position==9)

#   position percentageA percentageC percentageG percentageT  name
# 9         9        0.90        0.90        0.90        0.90 item1
# 19        9        0.45        0.45        0.45        0.45 item2

Flatten all your lists of lists of dataframes into a single dataframe

# Now create anoter list, files2, duplicate just for the sake of the example 
files2 <- files1 
# file1 and file2 both have a name column inside their dataframes already 
# Create a list of list of dataframes
lolod <- list(files1 = files1, files2 = files2) 
str(lolod) # a list of lists
# Flatten to a list of dataframes
# Use sapply to keep names based on this answer http://stackoverflow.com/a/9469981/2641825
lod <- sapply(lolod,  Reduce, f=rbind, simplify = FALSE, USE.NAMES = TRUE) 
# Add the name inside each data frame again
addfilename <- function(i, listoffiles){
     dtf <- listoffiles[[i]] # Extract the dataframe from the list
     dtf$filename <- names(listoffiles[i]) # Add the name inside the data frame
     return(dtf)
}
lod <- lapply(seq_along(lod), addfilename, lod)


# Flatten to a dataframe
d <- Reduce(rbind, lod)
# Now the data structure is flattened and much easier to deal with

subset(d,position==9)
#    position percentageA percentageC percentageG percentageT  name filename
# 9         9        0.90        0.90        0.90        0.90 item1   files1
# 19        9        0.45        0.45        0.45        0.45 item2   files1
# 30        9        0.90        0.90        0.90        0.90 item1   files2
# 40        9        0.45        0.45        0.45        0.45 item2   files2

This answer is much longer than I expected it to be. I hope I didn't frighten you. Inspired by tidy data, simplifying the data structure will facilitate your work later on. This complex list renaming thing would probably not have been necessary if you had provided names inside the original data.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110