data.frame rows to a list

Question

I have a data.frame which I would like to convert to a list by rows, meaning each row would correspond to its own list elements. In other words, I would like a list that is as long as the data.frame has rows.

So far, I've tackled this problem in the following manner, but I was wondering if there's a better way to approach this.

xy.df <- data.frame(x = runif(10),  y = runif(10))

# pre-allocate a list and fill it with a loop
xy.list <- vector("list", nrow(xy.df))
for (i in 1:nrow(xy.df)) {
    xy.list[[i]] <- xy.df[i,]
}

flodel · Accepted Answer · 2015-12-03T12:02:13.883

195

Like this:

xy.list <- split(xy.df, seq(nrow(xy.df)))

And if you want the rownames of xy.df to be the names of the output list, you can do:

xy.list <- setNames(split(xy.df, seq(nrow(xy.df))), rownames(xy.df))

edited Dec 03 '15 at 12:02

answered Jan 17 '13 at 00:45

flodel

87,577
21
185
223

9

Note that, after using `split` each element has type `data.frame with 1 rows and N columns` instead of `list of length N` – Karol Daniluk Feb 19 '19 at 19:40
I would only add that if you use `split` you should probably do `drop=T` otherwise your original levels for factors won't drop – Denis Jan 07 '20 at 16:50
@KarolDaniluk you can call `xy.list2 = lapply(xy.list,as.list)` to make its elements lists or `xy.list2 = lapply(xy.list,as.anything)`. – Luke Feb 11 '22 at 14:12

score 64 · Answer 2 · answered Aug 16 '10 at 11:22

64

Eureka!

xy.list <- as.list(as.data.frame(t(xy.df)))

answered Aug 16 '10 at 11:22

Roman Luštrik

69,533
24
154
197

Beat me ;-) . Still, if you'd like just to loop over these values, better use apply. – mbq Aug 16 '10 at 13:16
1

Care to demonstrate how to use apply? – Roman Luštrik Aug 17 '10 at 06:04
5

`unlist(apply(xy.df, 1, list), recursive = FALSE)`. However flodel's solution is the more efficient than using `apply` or `t`. – Arun May 14 '13 at 09:13
13

The problem here is that `t` converts the `data.fame` to a `matrix` so that the elements in your list are atomic vectors, not list as the OP requested. It is usually not a problem until your `xy.df` contains mixed types... – Calimo Feb 28 '14 at 14:40
2

If you want to loop over the values, I do not recommend `apply`. It's actually just a for loop implemented in R. `lapply` performs the looping in C, which is significantly faster. This list-of-rows format is actually preferable if you're doing a lot of looping. – Eli Sander Dec 21 '15 at 16:54
1

Adding another comment from the future, an `apply` version is `.mapply(data.frame, xy.df, NULL)` – alexis_laz Jul 24 '16 at 08:40
Many a year later, you could do `lapply(transpose(unclass(dat)), as.list)` with the `data.table` package to return the same result. The one issue with these methods are that they result in a convert the resulting element to a character vector if any elements of the original data.frame are non-numeric. My (super late) answer provides a couple of methods that preserve the data types. – lmo Mar 16 '18 at 00:15
This made my day!! Super fast solution, great for package developers! – MS Berends Jul 03 '20 at 08:21
This solution changes the type of the data. It casts all data to the most generic column type. – skan Apr 19 '23 at 16:14

score 24 · Answer 3 · answered Nov 24 '17 at 18:23

24

A more modern solution uses only purrr::transpose:

library(purrr)
iris[1:2,] %>% purrr::transpose()
#> [[1]]
#> [[1]]$Sepal.Length
#> [1] 5.1
#> 
#> [[1]]$Sepal.Width
#> [1] 3.5
#> 
#> [[1]]$Petal.Length
#> [1] 1.4
#> 
#> [[1]]$Petal.Width
#> [1] 0.2
#> 
#> [[1]]$Species
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$Sepal.Length
#> [1] 4.9
#> 
#> [[2]]$Sepal.Width
#> [1] 3
#> 
#> [[2]]$Petal.Length
#> [1] 1.4
#> 
#> [[2]]$Petal.Width
#> [1] 0.2
#> 
#> [[2]]$Species
#> [1] 1

answered Nov 24 '17 at 18:23

Mike Stanley

1,420
11
13

Oh, I'm not OP, but this is *exactly* what I needed (and I use mostly tidyverse). Thank you for this solution! – michdn Jul 18 '22 at 14:59
The split and the lapply solutions produce a list of dataframes. The purrr solution produces a list of lists. – skan Apr 19 '23 at 16:17

score 18 · Answer 4 · answered Oct 24 '19 at 12:23

18

A couple of more options :

With asplit

asplit(xy.df, 1)
#[[1]]
#     x      y 
#0.1137 0.6936 

#[[2]]
#     x      y 
#0.6223 0.5450 

#[[3]]
#     x      y 
#0.6093 0.2827 
#....

With split and row

split(xy.df, row(xy.df)[, 1])

#$`1`
#       x      y
#1 0.1137 0.6936

#$`2`
#       x     y
#2 0.6223 0.545

#$`3`
#       x      y
#3 0.6093 0.2827
#....

data

set.seed(1234)
xy.df <- data.frame(x = runif(10),  y = runif(10))

answered Oct 24 '19 at 12:23

Ronak Shah

377,200
20
156
213

My xy.df is entirely numeric. asplit(xy.df, 1) gave me a numeric list out. split(xy.df, f = seq(nrow(xy.df))) did not. Thanks! – Brian Flaherty Nov 24 '20 at 05:40

score 17 · Answer 5 · answered Jan 16 '13 at 15:42

17

If you want to completely abuse the data.frame (as I do) and like to keep the $ functionality, one way is to split you data.frame into one-line data.frames gathered in a list :

> df = data.frame(x=c('a','b','c'), y=3:1)
> df
  x y
1 a 3
2 b 2
3 c 1

# 'convert' into a list of data.frames
ldf = lapply(as.list(1:dim(df)[1]), function(x) df[x[1],])

> ldf
[[1]]
x y
1 a 3    
[[2]]
x y
2 b 2
[[3]]
x y
3 c 1

# and the 'coolest'
> ldf[[2]]$y
[1] 2

It is not only intellectual masturbation, but allows to 'transform' the data.frame into a list of its lines, keeping the $ indexation which can be useful for further use with lapply (assuming the function you pass to lapply uses this $ indexation)

answered Jan 16 '13 at 15:42

Qiou Bi

191
1
5

How do we put them back together again? Turn a list of `data.frame`s into a single `data.frame`? – Aaron McDaid Oct 07 '14 at 13:21
4

@AaronMcDaid You can use do.call and rbind: df == do.call("rbind", ldf) – random_forest_fanatic Mar 04 '15 at 08:42
@AaronMcDaid Or data.table::rbindlist(). If your original data frame was large, the speed gains will be significant. – Empiromancer Jul 12 '16 at 22:04

lmo · Answer 6 · 2018-06-09T10:51:15.413

I was working on this today for a data.frame (really a data.table) with millions of observations and 35 columns. My goal was to return a list of data.frames (data.tables) each with a single row. That is, I wanted to split each row into a separate data.frame and store these in a list.

Here are two methods I came up with that were roughly 3 times faster than split(dat, seq_len(nrow(dat))) for that data set. Below, I benchmark the three methods on a 7500 row, 5 column data set (iris repeated 50 times).

library(data.table)
library(microbenchmark)

microbenchmark(
split={dat1 <- split(dat, seq_len(nrow(dat)))},
setDF={dat2 <- lapply(seq_len(nrow(dat)),
                  function(i) setDF(lapply(dat, "[", i)))},
attrDT={dat3 <- lapply(seq_len(nrow(dat)),
           function(i) {
             tmp <- lapply(dat, "[", i)
             attr(tmp, "class") <- c("data.table", "data.frame")
             setDF(tmp)
           })},
datList = {datL <- lapply(seq_len(nrow(dat)),
                          function(i) lapply(dat, "[", i))},
times=20
)

This returns

Unit: milliseconds
       expr      min       lq     mean   median        uq       max neval
      split 861.8126 889.1849 973.5294 943.2288 1041.7206 1250.6150    20
      setDF 459.0577 466.3432 511.2656 482.1943  500.6958  750.6635    20
     attrDT 399.1999 409.6316 461.6454 422.5436  490.5620  717.6355    20
    datList 192.1175 201.9896 241.4726 208.4535  246.4299  411.2097    20

While the differences are not as large as in my previous test, the straight setDF method is significantly faster at all levels of the distribution of runs with max(setDF) < min(split) and the attr method is typically more than twice as fast.

A fourth method is the extreme champion, which is a simple nested lapply, returning a nested list. This method exemplifies the cost of constructing a data.frame from a list. Moreover, all methods I tried with the data.frame function were roughly an order of magnitude slower than the data.table techniques.

data

dat <- vector("list", 50)
for(i in 1:50) dat[[i]] <- iris
dat <- setDF(rbindlist(dat))

Artem Klevtsov · Answer 7 · 2017-04-08T15:21:21.877

Seems a current version of the purrr (0.2.2) package is the fastest solution:

by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out

Let's compare the most interesting solutions:

data("Batting", package = "Lahman")
x <- Batting[1:10000, 1:10]
library(benchr)
library(purrr)
benchmark(
    split = split(x, seq_len(.row_names_info(x, 2L))),
    mapply = .mapply(function(...) structure(list(...), class = "data.frame", row.names = 1L), x, NULL),
    purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
)

Rsults:

Benchmark summary:
Time units : milliseconds 
  expr n.eval   min  lw.qu median   mean  up.qu  max  total relative
 split    100 983.0 1060.0 1130.0 1130.0 1180.0 1450 113000     34.3
mapply    100 826.0  894.0  963.0  972.0 1030.0 1320  97200     29.3
 purrr    100  24.1   28.6   32.9   44.9   40.5  183   4490      1.0

Also we can get the same result with Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List df2list(const DataFrame& x) {
    std::size_t nrows = x.rows();
    std::size_t ncols = x.cols();
    CharacterVector nms = x.names();
    List res(no_init(nrows));
    for (std::size_t i = 0; i < nrows; ++i) {
        List tmp(no_init(ncols));
        for (std::size_t j = 0; j < ncols; ++j) {
            switch(TYPEOF(x[j])) {
                case INTSXP: {
                    if (Rf_isFactor(x[j])) {
                        IntegerVector t = as<IntegerVector>(x[j]);
                        RObject t2 = wrap(t[i]);
                        t2.attr("class") = "factor";
                        t2.attr("levels") = t.attr("levels");
                        tmp[j] = t2;
                    } else {
                        tmp[j] = as<IntegerVector>(x[j])[i];
                    }
                    break;
                }
                case LGLSXP: {
                    tmp[j] = as<LogicalVector>(x[j])[i];
                    break;
                }
                case CPLXSXP: {
                    tmp[j] = as<ComplexVector>(x[j])[i];
                    break;
                }
                case REALSXP: {
                    tmp[j] = as<NumericVector>(x[j])[i];
                    break;
                }
                case STRSXP: {
                    tmp[j] = as<std::string>(as<CharacterVector>(x[j])[i]);
                    break;
                }
                default: stop("Unsupported type '%s'.", type2name(x));
            }
        }
        tmp.attr("class") = "data.frame";
        tmp.attr("row.names") = 1;
        tmp.attr("names") = nms;
        res[i] = tmp;
    }
    res.attr("names") = x.attr("row.names");
    return res;
}

Now caompare with purrr:

benchmark(
    purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out,
    rcpp = df2list(x)
)

Results:

Benchmark summary:
Time units : milliseconds 
 expr n.eval  min lw.qu median mean up.qu   max total relative
purrr    100 25.2  29.8   37.5 43.4  44.2 159.0  4340      1.1
 rcpp    100 19.0  27.9   34.3 35.8  37.2  93.8  3580      1.0

benchmarking on a tiny data set of 150 rows doesn't make much sense as no one will notice any difference in microseconds and it doesn't scale — David Arenburg, Mar 26 '17 at 06:56
And in addition to being in purrrlyr, it's about to be deprecated. There are now other methods combining tidyr::nest, dplyr::mutate purrr::map to achieve the same result — Mike Stanley, Nov 24 '17 at 18:19

score 2 · Answer 8 · answered Sep 18 '16 at 18:09

2

An alternative way is to convert the df to a matrix then applying the list apply lappy function over it: ldf <- lapply(as.matrix(myDF), function(x)x)

answered Sep 18 '16 at 18:09

user3553260

691
2
9
21

1

This was the best way for me -- I had to discover it by trial and error, was about to add it to the list of solutions because it was far easier to implement and precisely what I was looking for, but here it is already. Should be higher rated answer in my opinion. – cmcgraw Feb 04 '23 at 21:24

score 2 · Answer 9 · answered Jun 04 '17 at 22:27

The best way for me was:

Example data:

Var1<-c("X1",X2","X3")
Var2<-c("X1",X2","X3")
Var3<-c("X1",X2","X3")

Data<-cbind(Var1,Var2,Var3)

ID    Var1   Var2  Var3 
1      X1     X2    X3
2      X4     X5    X6
3      X7     X8    X9

We call the BBmisc library

library(BBmisc)

data$lists<-convertRowsToList(data[,2:4])

And the result will be:

ID    Var1   Var2  Var3  lists
1      X1     X2    X3   list("X1", "X2", X3") 
2      X4     X5    X6   list("X4","X5", "X6") 
3      X7     X8    X9   list("X7,"X8,"X9)

score 2 · Answer 10 · edited Nov 26 '17 at 11:55

2

Like @flodel wrote: This converts your dataframe into a list that has the same number of elements as number of rows in dataframe:

NewList <- split(df, f = seq(nrow(df)))

You can additionaly add a function to select only those columns that are not NA in each element of the list:

NewList2 <- lapply(NewList, function(x) x[,!is.na(x)])

edited Nov 26 '17 at 11:55

David Arenburg

91,361
17
137
196

answered Sep 29 '17 at 08:35

michal

31
2

score 1 · Answer 11 · answered Sep 28 '16 at 18:46

1

Another alternative using library(purrr) (that seems to be a bit quicker on large data.frames)

flatten(by_row(xy.df, ..f = function(x) flatten_chr(x), .labels = FALSE))

answered Sep 28 '16 at 18:46

MrHopko

879
1
7
16

4

` by_row()` has now moved to ` library(purrrlyr)` – MrHopko Aug 17 '17 at 09:35

score 0 · Answer 12 · answered Jun 03 '17 at 19:22

The by_row function from the purrrlyr package will do this for you.

This example demonstrates

myfn <- function(row) {
  #row is a tibble with one row, and the same number of columns as the original df
  l <- as.list(row)
  return(l)
}

list_of_lists <- purrrlyr::by_row(df, myfn, .labels=FALSE)$.out

By default, the returned value from myfn is put into a new list column in the df called .out. The $.out at the end of the above statement immediately selects this column, returning a list of lists.

score 0 · Answer 13 · answered Jul 10 '23 at 14:04

0

You can use the very fast collapse::mrtl:

library(collapse)
mrtl(as.matrix(xy.df))

answered Jul 10 '23 at 14:04

Maël

45,206
3
29
67

data.frame rows to a list

13 Answers13

Linked

Related