0

I am learning R and want to know the difference between populating an empty list and an empty data frame. My required dataset will have 3 variables, with 51 observations each.

Using empty list:

zscoremds <- list()
for(col in names(mds_numbers)) { 
  zscoremds[[col]] = zscore(mds_numbers[[col]]) 
}

mds_numbers is a 51x3 data frame with named columns; zscore is a function that calculates the z-score of each element in a column.

Using empty data frame:

zscoremds <- data.frame()
for (j in 1:3) {
  newcol <- zscore(mds_numbers[[j]])
  zscoremds <- cbind(zscoremds, newcol)
}

This does not work. I get an error "differing number of rows: 0,51"

However, when I pre-allocate the data frame to have 51 rows, it works:

zscoremds <- data.frame(matrix(nrow = 51, ncol = 0))                        
for (j in 1:3) {
  newcol <- zscore(mds_numbers[[j]])
  zscoremds <- cbind(zscoremds, newcol)
}

Why does it work on an empty list, but not on an empty data frame if I add new columns to it?

  • 1
    Related, but not duplicate: [Add Columns to an empty data frame in R](https://stackoverflow.com/q/26684072/8366499) – divibisan Mar 02 '23 at 18:00

1 Answers1

1

A list can have elements of different length:

test_list <- list(
  a = 1:10,
  b = "hello there",
  c = list(1)
)
test_list
#> $a
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> $b
#> [1] "hello there"
#> 
#> $c
#> $c[[1]]
#> [1] 1

However, if you want to bind dataframes by columns, both dataframes need to have the same number of rows. Same thing when you create a dataframe with data.frame() (see an exception at the end of the answer):

data.frame(
  x = 1:2, # 2 rows
  y = 1:3 # 3 rows
)
#> Error in data.frame(x = 1:2, y = 1:3): les arguments impliquent des nombres de lignes différents : 2, 3

cbind(
  data.frame(), # 0 rows
  data.frame(y = 1) # 1 row
)
#> Error in data.frame(..., check.names = FALSE): les arguments impliquent des nombres de lignes différents : 0, 1

cbind(
  data.frame(x = 0), # 1 row
  data.frame(y = 1) # 1 row
)
#>   x y
#> 1 0 1

A dataframe is a list of elements of the same length with some additional attributes (rownames, column names, etc.). Therefore, one way to bypass the error you have is to make a list and then convert it as a dataframe.

my_df <- list()
for (i in 1:5) {
  my_df[[paste0("x", i)]] <- rep(i, 3) * 2
}
as.data.frame(my_df)
#>   x1 x2 x3 x4 x5
#> 1  2  4  6  8 10
#> 2  2  4  6  8 10
#> 3  2  4  6  8 10

But again you need to make sure that all these elements have the same length before converting them to a dataframe. It seems you already found out some of these behaviors so I’m not sure this answers the question.


Regarding the data.frame() function, note that there’s a particular behavior when the length of one input is 1. This input will be recycled (= repeated) in all rows:

data.frame(
  x = 1:2, # 2 rows
  y = 0 # 1 value but it gets repeated on all rows
)
#>   x y
#> 1 1 0
#> 2 2 0
bretauv
  • 7,756
  • 2
  • 20
  • 57