flatten a data frame

Question

I have this nested data frame

test <- structure(list(id = c(13, 27), seq = structure(list(
`1` = c("1997", "1997", "1997", "2007"),
`2` = c("2007", "2007", "2007", "2007", "2007", "2007", "2007")), 
.Names = c("1", "2"))), .Names = c("penr", 
"seq"), row.names = c("1", "2"), class = "data.frame")

I want a list of all values in the second column, namely

result <- c("1997", "1997", "1997", "2007", "2007", "2007", "2007", "2007", "2007", "2007", "2007")

Is there an easy way to achieve this?

Paul Hiemstra · Accepted Answer · 2012-02-27T15:20:52.667

16

This line does the trick:

do.call("c", test[["seq"]])

or equivalent:

c(test[["seq"]], recursive = TRUE)

or even:

unlist(test[["seq"]])

The output of these functions is:

    11     12     13     14     21     22     23     24     25     26     27 
"1997" "1997" "1997" "2007" "2007" "2007" "2007" "2007" "2007" "2007" "2007"

To get rid of the names above the character vector, call as.character on the resulting object:

> as.character((unlist(test[["seq"]])))
 [1] "1997" "1997" "1997" "2007" "2007" "2007" "2007" "2007" "2007" "2007"
[11] "2007"

edited Feb 27 '12 at 15:20

answered Feb 27 '12 at 15:15

Paul Hiemstra

59,984
12
142
149

Could you tick the mark below my answer? In that way everyone knows this question has been answered (and I get some rep :)) – Paul Hiemstra Feb 27 '12 at 15:22
of course - but I have to wait for some minutes because of a limitation of stack exchange. You were too fast :) – speendo Feb 27 '12 at 15:25
With this kind of questions one has to be fast, I am surprised that no other answers where not posted simultaneously by e.g. @Andrie ;). – Paul Hiemstra Feb 27 '12 at 15:31
I typically use unlist in this situation because I didn't know there was an alternative. Thanks for sharing. Unlist is actually the slowest of the three methods. +1 – Tyler Rinker Feb 27 '12 at 15:42
@TylerRinker, if you have some benchmarks, please post them as I think that would be interesting. – Paul Hiemstra Feb 27 '12 at 15:43

score 5 · Answer 2 · answered Feb 27 '12 at 16:05

This is not an answer but a follow up/supplement to Paul's answer:

Consistently on any number of iterations the c method performs the best. However as I increased the number of iterations to 100000 unlist went from the poorest to very close to the c method.

1000 iterations

     test replications elapsed relative user.self sys.self user.child sys.child
2       c         1000    0.04 1.333333      0.03        0         NA        NA
1 do.call         1000    0.03 1.000000      0.03        0         NA        NA
3  unlist         1000    0.23 7.666667      0.04        0         NA        NA

100,000 iterations

     test replications elapsed relative user.self sys.self user.child sys.child
2       c       100000    8.39 1.000000      3.62        0         NA        NA
1 do.call       100000   10.47 1.247914      4.04        0         NA        NA
3  unlist       100000    9.97 1.188319      3.81        0         NA        NA

Again thanks for sharing Paul!

Benchmarking performed using rbenchmark on a win 7 machine running R 2.14.1

flatten a data frame

2 Answers2

Linked