In R, reorganize list based on element names (rbind and indicator variable)

Question

I am trying to reorganize my data, basically a list of data.frames. Its elements represent subjects of interest (A and B), with observations on x and y, collected on two occasions (1 and 2). I am trying to make this a list that contains data.frames referring to the subjects, with the information on which occasion x and y were collected being stored in the respective data.frames as new variable, as opposed to the element name:

library('rlist')

A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))

list <- list(A1=A1,A2=A2,B1=B1,B2=B2)

A <- do.call(rbind,list.match(list,"A"))
B <- do.call(rbind,list.match(list,"B"))

list <- list(A=A,B=B)
list <- lapply(list,function(x) {
      y <- data.frame(x)
      y$class <- c(rep.int(1,2),rep.int(2,2))
      return(y)
})

> list
$A
      x  y class
A1.1 66 96     1
A1.2 76 58     1
A2.1 50 93     2
A2.2 57 12     2

$B
      x  y class
B1.1 58 56     1
B1.2 69 15     1
B2.1 77 77     2
B2.2  9  9     2

In my real world problem there are about 500 subjects, not always two occasions, differing numbers of observations.

So my example above is just to illustrate where I want to get, and I am stuck at how to pass to the do.call-rbind that it should, based on elements names, bind subject-specific elements as new list elements together, while assigning a new variable.

To me, this is a somewhat fuzzy task, and the closest I got was the rlist package. This question is related but uses unique to identify elements, whereas in my case it seems to be more a regex problem.

I'd be happy even for instructions on how to use google, any keywords for further research etc.

If you supply a few of the real data frame names you have, someone trying to help will be able to write a more accurate regex for you. — Pierre L, Oct 06 '15 at 13:33
I thought leaving the regex issue at a more abstract level would de-clutter my question. Of course, stackoverflow proved to have really helpful people around, once again. I'll process your answer below and get back :) — leokrkr, Oct 06 '15 at 14:30

Pierre L · Answer 1 · 2015-10-06T19:17:45.743

From the data you provided:

subj <- sub("[A-Z]*", "", names(lst))
newlst <- Map(function(x, y) {x[,"class"] <- y;x}, lst, subj)

First we do the regular expression call to isolate the number that will go in the class column. In this case, I matched on capital letters and erased them leaving the number. Therefore, "A1" becomes "1". Please note that the real names will mean a different regex pattern.

Then we use Map to create a new column for each data frame and save to a new list called newlst. Map takes the first element of each argument and carries out the function then continues on with each object element. So the first data frame in lst and the first number in subj are used first. The anonymous function I used is function(x,y) {x[, "class"] <- y; x}. It takes two arguments. The first is the data frame, the second is the column value.

Now it's much easier to move forward. We can create a vector called uniq.nmes to get the names of the data frames that we will combine. Where "A1" will become "A". Then we can rbind on that match:

uniq.nmes <- unique(sub("\\d", "", names(lst)))
lapply(uniq.nmes, function(x) {
  do.call(rbind, newlst[grep(x, names(newlst))])
})
# [[1]]
#       x  y class
# A1.1  1 79     1
# A1.2 30 13     1
# A2.1 90 39     2
# A2.2 43 22     2
# 
# [[2]]
#       x  y class
# B1.1 54 59     1
# B1.2 83 90     1
# B2.1 85 36     2
# B2.2 91 28     2

Data

A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))

lst <- list(A1=A1,A2=A2,B1=B1,B2=B2)

score 0 · Accepted Answer · answered Oct 06 '15 at 13:50

It sounds like you're doing a lot of gymnastics because you have a specific form in mind. What I would suggest is first trying to make the data tidy. Without reading the link, the quick summary is to put your data into a single data frame, where it can be easily processed.

The quick version of the answer (here I've used lst instead of list for the name to avoid confusion with the built-in list) is to do this:

do.call(rbind,
  lapply(seq(lst), function(i) {
    lst[[i]]$type <- names(lst)[i]; lst[[i]]
  })
)

What this will do is create a single data frame, with a column, "type", that contains the name of the list item in which that row appeared.

Using a slightly simplified version of your initial data:

lst <- list(A1=data.frame(x=rnorm(5)), A2=data.frame(x=rnorm(3)), B=data.frame(x=rnorm(5)))
lst
$A1
           x
1  1.3386071
2  1.9875317
3  0.4942179
4 -0.1803087
5  0.3094100

$A2
           x
1 -0.3388195
2  1.1993115
3  1.9524970

$B
           x
1 -0.1317882
2 -0.3383545
3  0.8864144
4  0.9241305
5 -0.8481927

And then applying the magic function

df <- do.call(rbind,
   lapply(seq(lst), function(i) {
     lst[[i]]$type <- names(lst)[i]; lst[[i]]
   })
 )
df
            x type
1   1.3386071   A1
2   1.9875317   A1
3   0.4942179   A1
4  -0.1803087   A1
5   0.3094100   A1
6  -0.3388195   A2
7   1.1993115   A2
8   1.9524970   A2
9  -0.1317882    B
10 -0.3383545    B
11  0.8864144    B
12  0.9241305    B
13 -0.8481927    B

From here we can process to our hearts content; with operations like df$subject <- gsub("[0-9]*", "", df$type) to extract the non-numeric portion of type, and tools like split can be used to generate the sub-lists that you mention in your question.

In addition, once it is in this form, you can use functions like by and aggregate or libraries like dplyr or data.table to do more advanced split-apply-combine operations for data analysis.

As long as my reputation does not allow for upvotes, I will leave my thanks here. I need more time to process your answer than you guys to write them! — leokrkr, Oct 06 '15 at 14:33

In R, reorganize list based on element names (rbind and indicator variable)

2 Answers2