173

I have a data.frame which I would like to convert to a list by rows, meaning each row would correspond to its own list elements. In other words, I would like a list that is as long as the data.frame has rows.

So far, I've tackled this problem in the following manner, but I was wondering if there's a better way to approach this.

xy.df <- data.frame(x = runif(10),  y = runif(10))

# pre-allocate a list and fill it with a loop
xy.list <- vector("list", nrow(xy.df))
for (i in 1:nrow(xy.df)) {
    xy.list[[i]] <- xy.df[i,]
}
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197

13 Answers13

195

Like this:

xy.list <- split(xy.df, seq(nrow(xy.df)))

And if you want the rownames of xy.df to be the names of the output list, you can do:

xy.list <- setNames(split(xy.df, seq(nrow(xy.df))), rownames(xy.df))
flodel
  • 87,577
  • 21
  • 185
  • 223
  • 9
    Note that, after using `split` each element has type `data.frame with 1 rows and N columns` instead of `list of length N` – Karol Daniluk Feb 19 '19 at 19:40
  • I would only add that if you use `split` you should probably do `drop=T` otherwise your original levels for factors won't drop – Denis Jan 07 '20 at 16:50
  • @KarolDaniluk you can call `xy.list2 = lapply(xy.list,as.list)` to make its elements lists or `xy.list2 = lapply(xy.list,as.anything)`. – Luke Feb 11 '22 at 14:12
64

Eureka!

xy.list <- as.list(as.data.frame(t(xy.df)))
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Beat me ;-) . Still, if you'd like just to loop over these values, better use apply. – mbq Aug 16 '10 at 13:16
  • 1
    Care to demonstrate how to use apply? – Roman Luštrik Aug 17 '10 at 06:04
  • 5
    `unlist(apply(xy.df, 1, list), recursive = FALSE)`. However flodel's solution is the more efficient than using `apply` or `t`. – Arun May 14 '13 at 09:13
  • 13
    The problem here is that `t` converts the `data.fame` to a `matrix` so that the elements in your list are atomic vectors, not list as the OP requested. It is usually not a problem until your `xy.df` contains mixed types... – Calimo Feb 28 '14 at 14:40
  • 2
    If you want to loop over the values, I do not recommend `apply`. It's actually just a for loop implemented in R. `lapply` performs the looping in C, which is significantly faster. This list-of-rows format is actually preferable if you're doing a lot of looping. – Eli Sander Dec 21 '15 at 16:54
  • 1
    Adding another comment from the future, an `apply` version is `.mapply(data.frame, xy.df, NULL)` – alexis_laz Jul 24 '16 at 08:40
  • Many a year later, you could do `lapply(transpose(unclass(dat)), as.list)` with the `data.table` package to return the same result. The one issue with these methods are that they result in a convert the resulting element to a character vector if any elements of the original data.frame are non-numeric. My (super late) answer provides a couple of methods that preserve the data types. – lmo Mar 16 '18 at 00:15
  • This made my day!! Super fast solution, great for package developers! – MS Berends Jul 03 '20 at 08:21
  • This solution changes the type of the data. It casts all data to the most generic column type. – skan Apr 19 '23 at 16:14
24

A more modern solution uses only purrr::transpose:

library(purrr)
iris[1:2,] %>% purrr::transpose()
#> [[1]]
#> [[1]]$Sepal.Length
#> [1] 5.1
#> 
#> [[1]]$Sepal.Width
#> [1] 3.5
#> 
#> [[1]]$Petal.Length
#> [1] 1.4
#> 
#> [[1]]$Petal.Width
#> [1] 0.2
#> 
#> [[1]]$Species
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$Sepal.Length
#> [1] 4.9
#> 
#> [[2]]$Sepal.Width
#> [1] 3
#> 
#> [[2]]$Petal.Length
#> [1] 1.4
#> 
#> [[2]]$Petal.Width
#> [1] 0.2
#> 
#> [[2]]$Species
#> [1] 1
Mike Stanley
  • 1,420
  • 11
  • 13
  • Oh, I'm not OP, but this is *exactly* what I needed (and I use mostly tidyverse). Thank you for this solution! – michdn Jul 18 '22 at 14:59
  • The split and the lapply solutions produce a list of dataframes. The purrr solution produces a list of lists. – skan Apr 19 '23 at 16:17
18

A couple of more options :

With asplit

asplit(xy.df, 1)
#[[1]]
#     x      y 
#0.1137 0.6936 

#[[2]]
#     x      y 
#0.6223 0.5450 

#[[3]]
#     x      y 
#0.6093 0.2827 
#....

With split and row

split(xy.df, row(xy.df)[, 1])

#$`1`
#       x      y
#1 0.1137 0.6936

#$`2`
#       x     y
#2 0.6223 0.545

#$`3`
#       x      y
#3 0.6093 0.2827
#....

data

set.seed(1234)
xy.df <- data.frame(x = runif(10),  y = runif(10))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • My xy.df is entirely numeric. asplit(xy.df, 1) gave me a numeric list out. split(xy.df, f = seq(nrow(xy.df))) did not. Thanks! – Brian Flaherty Nov 24 '20 at 05:40
17

If you want to completely abuse the data.frame (as I do) and like to keep the $ functionality, one way is to split you data.frame into one-line data.frames gathered in a list :

> df = data.frame(x=c('a','b','c'), y=3:1)
> df
  x y
1 a 3
2 b 2
3 c 1

# 'convert' into a list of data.frames
ldf = lapply(as.list(1:dim(df)[1]), function(x) df[x[1],])

> ldf
[[1]]
x y
1 a 3    
[[2]]
x y
2 b 2
[[3]]
x y
3 c 1

# and the 'coolest'
> ldf[[2]]$y
[1] 2

It is not only intellectual masturbation, but allows to 'transform' the data.frame into a list of its lines, keeping the $ indexation which can be useful for further use with lapply (assuming the function you pass to lapply uses this $ indexation)

Qiou Bi
  • 191
  • 1
  • 5
9

I was working on this today for a data.frame (really a data.table) with millions of observations and 35 columns. My goal was to return a list of data.frames (data.tables) each with a single row. That is, I wanted to split each row into a separate data.frame and store these in a list.

Here are two methods I came up with that were roughly 3 times faster than split(dat, seq_len(nrow(dat))) for that data set. Below, I benchmark the three methods on a 7500 row, 5 column data set (iris repeated 50 times).

library(data.table)
library(microbenchmark)

microbenchmark(
split={dat1 <- split(dat, seq_len(nrow(dat)))},
setDF={dat2 <- lapply(seq_len(nrow(dat)),
                  function(i) setDF(lapply(dat, "[", i)))},
attrDT={dat3 <- lapply(seq_len(nrow(dat)),
           function(i) {
             tmp <- lapply(dat, "[", i)
             attr(tmp, "class") <- c("data.table", "data.frame")
             setDF(tmp)
           })},
datList = {datL <- lapply(seq_len(nrow(dat)),
                          function(i) lapply(dat, "[", i))},
times=20
) 

This returns

Unit: milliseconds
       expr      min       lq     mean   median        uq       max neval
      split 861.8126 889.1849 973.5294 943.2288 1041.7206 1250.6150    20
      setDF 459.0577 466.3432 511.2656 482.1943  500.6958  750.6635    20
     attrDT 399.1999 409.6316 461.6454 422.5436  490.5620  717.6355    20
    datList 192.1175 201.9896 241.4726 208.4535  246.4299  411.2097    20

While the differences are not as large as in my previous test, the straight setDF method is significantly faster at all levels of the distribution of runs with max(setDF) < min(split) and the attr method is typically more than twice as fast.

A fourth method is the extreme champion, which is a simple nested lapply, returning a nested list. This method exemplifies the cost of constructing a data.frame from a list. Moreover, all methods I tried with the data.frame function were roughly an order of magnitude slower than the data.table techniques.

data

dat <- vector("list", 50)
for(i in 1:50) dat[[i]] <- iris
dat <- setDF(rbindlist(dat))
lmo
  • 37,904
  • 9
  • 56
  • 69
7

Seems a current version of the purrr (0.2.2) package is the fastest solution:

by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out

Let's compare the most interesting solutions:

data("Batting", package = "Lahman")
x <- Batting[1:10000, 1:10]
library(benchr)
library(purrr)
benchmark(
    split = split(x, seq_len(.row_names_info(x, 2L))),
    mapply = .mapply(function(...) structure(list(...), class = "data.frame", row.names = 1L), x, NULL),
    purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out
)

Rsults:

Benchmark summary:
Time units : milliseconds 
  expr n.eval   min  lw.qu median   mean  up.qu  max  total relative
 split    100 983.0 1060.0 1130.0 1130.0 1180.0 1450 113000     34.3
mapply    100 826.0  894.0  963.0  972.0 1030.0 1320  97200     29.3
 purrr    100  24.1   28.6   32.9   44.9   40.5  183   4490      1.0

Also we can get the same result with Rcpp:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
List df2list(const DataFrame& x) {
    std::size_t nrows = x.rows();
    std::size_t ncols = x.cols();
    CharacterVector nms = x.names();
    List res(no_init(nrows));
    for (std::size_t i = 0; i < nrows; ++i) {
        List tmp(no_init(ncols));
        for (std::size_t j = 0; j < ncols; ++j) {
            switch(TYPEOF(x[j])) {
                case INTSXP: {
                    if (Rf_isFactor(x[j])) {
                        IntegerVector t = as<IntegerVector>(x[j]);
                        RObject t2 = wrap(t[i]);
                        t2.attr("class") = "factor";
                        t2.attr("levels") = t.attr("levels");
                        tmp[j] = t2;
                    } else {
                        tmp[j] = as<IntegerVector>(x[j])[i];
                    }
                    break;
                }
                case LGLSXP: {
                    tmp[j] = as<LogicalVector>(x[j])[i];
                    break;
                }
                case CPLXSXP: {
                    tmp[j] = as<ComplexVector>(x[j])[i];
                    break;
                }
                case REALSXP: {
                    tmp[j] = as<NumericVector>(x[j])[i];
                    break;
                }
                case STRSXP: {
                    tmp[j] = as<std::string>(as<CharacterVector>(x[j])[i]);
                    break;
                }
                default: stop("Unsupported type '%s'.", type2name(x));
            }
        }
        tmp.attr("class") = "data.frame";
        tmp.attr("row.names") = 1;
        tmp.attr("names") = nms;
        res[i] = tmp;
    }
    res.attr("names") = x.attr("row.names");
    return res;
}

Now caompare with purrr:

benchmark(
    purrr = by_row(x, function(v) list(v)[[1L]], .collate = "list")$.out,
    rcpp = df2list(x)
)

Results:

Benchmark summary:
Time units : milliseconds 
 expr n.eval  min lw.qu median mean up.qu   max total relative
purrr    100 25.2  29.8   37.5 43.4  44.2 159.0  4340      1.1
 rcpp    100 19.0  27.9   34.3 35.8  37.2  93.8  3580      1.0
Artem Klevtsov
  • 9,193
  • 6
  • 52
  • 57
  • benchmarking on a tiny data set of 150 rows doesn't make much sense as no one will notice any difference in microseconds and it doesn't scale – David Arenburg Mar 26 '17 at 06:56
  • 4
    `by_row()` has now moved to `library(purrrlyr)` – MrHopko May 26 '17 at 16:20
  • And in addition to being in purrrlyr, it's about to be deprecated. There are now other methods combining tidyr::nest, dplyr::mutate purrr::map to achieve the same result – Mike Stanley Nov 24 '17 at 18:19
2

An alternative way is to convert the df to a matrix then applying the list apply lappy function over it: ldf <- lapply(as.matrix(myDF), function(x)x)

user3553260
  • 691
  • 2
  • 9
  • 21
  • 1
    This was the best way for me -- I had to discover it by trial and error, was about to add it to the list of solutions because it was far easier to implement and precisely what I was looking for, but here it is already. Should be higher rated answer in my opinion. – cmcgraw Feb 04 '23 at 21:24
2

The best way for me was:

Example data:

Var1<-c("X1",X2","X3")
Var2<-c("X1",X2","X3")
Var3<-c("X1",X2","X3")

Data<-cbind(Var1,Var2,Var3)

ID    Var1   Var2  Var3 
1      X1     X2    X3
2      X4     X5    X6
3      X7     X8    X9

We call the BBmisc library

library(BBmisc)

data$lists<-convertRowsToList(data[,2:4])

And the result will be:

ID    Var1   Var2  Var3  lists
1      X1     X2    X3   list("X1", "X2", X3") 
2      X4     X5    X6   list("X4","X5", "X6") 
3      X7     X8    X9   list("X7,"X8,"X9) 
RRuiz
  • 2,159
  • 21
  • 32
2

Like @flodel wrote: This converts your dataframe into a list that has the same number of elements as number of rows in dataframe:

NewList <- split(df, f = seq(nrow(df)))

You can additionaly add a function to select only those columns that are not NA in each element of the list:

NewList2 <- lapply(NewList, function(x) x[,!is.na(x)])
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
michal
  • 31
  • 2
1

Another alternative using library(purrr) (that seems to be a bit quicker on large data.frames)

flatten(by_row(xy.df, ..f = function(x) flatten_chr(x), .labels = FALSE))
MrHopko
  • 879
  • 1
  • 7
  • 16
0

The by_row function from the purrrlyr package will do this for you.

This example demonstrates

myfn <- function(row) {
  #row is a tibble with one row, and the same number of columns as the original df
  l <- as.list(row)
  return(l)
}

list_of_lists <- purrrlyr::by_row(df, myfn, .labels=FALSE)$.out

By default, the returned value from myfn is put into a new list column in the df called .out. The $.out at the end of the above statement immediately selects this column, returning a list of lists.

RobinL
  • 11,009
  • 8
  • 48
  • 68
0

You can use the very fast collapse::mrtl:

library(collapse)
mrtl(as.matrix(xy.df))
Maël
  • 45,206
  • 3
  • 29
  • 67