1

I was reading through the column-wise operations documentation for tidyverse's dplyr here: https://dplyr.tidyverse.org/articles/colwise.html, and toward the end of the article there are three bullet points, the first of which reads as follows:

"You can have a column of a data frame that is itself a data frame. This is something provided by base R, but it’s not very well documented, and it took a while to see that it was useful, not just a theoretical curiosity."

I'm not sure I understand what this means. Can someone provide example code of how to create a dataframe that has a column that is itself a dataframe so I can try to understand what this means?"

jay.sf
  • 60,139
  • 8
  • 53
  • 110
StatsStudent
  • 1,384
  • 2
  • 10
  • 28

3 Answers3

2

A quick example is shown using dplyr::nest_by, where the data column contains data frames.

Here each of the data frames in the data column are just corresponding data for each of the species.

Actually, read the docs of tidyr::nest to get the intuition.

library(dplyr)

iris %>% nest_by(Species)

#> # A tibble: 3 × 2
#> # Rowwise:  Species
#>   Species                  data
#>   <fct>      <list<tibble[,4]>>
#> 1 setosa               [50 × 4]
#> 2 versicolor           [50 × 4]
#> 3 virginica            [50 × 4]
shafee
  • 15,566
  • 3
  • 19
  • 47
1

Actually, a data.frame column can be a list. In base R we can use list2DF to create a data.frame from a list. Note, that data.frames are just a special kind of lists (Ref.).

To make a data.frame out of a vector and a list, we can do:

df <- list2DF(list(X1=1:3, X2=list(1:3, 1:3, 1:3)))
df
#   X1      X2
# 1  1 1, 2, 3
# 2  2 1, 2, 3
# 3  3 1, 2, 3

where

str(df)
# 'data.frame': 3 obs. of  2 variables:
# $ X1: int  1 2 3
# $ X2:List of 3
#  ..$ : int  1 2 3
#  ..$ : int  1 2 3
#  ..$ : int  1 2 3
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

You can construct such a data.frame using I.

df <- data.frame(a = 1:10,
                 b = I(data.frame(a = 1:10, b = letters[1:10])))

Although df is not printable, you can check its contents:

df$b
##>     a b
##> 1   1 a
##> 2   2 b
##> 3   3 c
##> 4   4 d
##> 5   5 e
##> 6   6 f
##> 7   7 g
##> 8   8 h
##> 9   9 i
##> 10 10 j

Or more conveniently convert it to a tibble:

tibble::as_tibble(df)
# A tibble: 10 × 2
       a   b$a $b   
   <int> <int> <chr>
 1     1     1 a    
 2     2     2 b    
 3     3     3 c    
 4     4     4 d    
 5     5     5 e    
 6     6     6 f    
 7     7     7 g    
 8     8     8 h    
 9     9     9 i    
10    10    10 j    
Stefano Barbi
  • 2,978
  • 1
  • 12
  • 11