Stacking two vectors into one column of data.frame with additional ID column

Question

I have data in multiple vectors that I would like to convert to a data.frame with one ID column (vector name) and one data column (vector values). Here's a toy example:

data.1 <- c(1, 2)
data.2 <- c(10, 20, 30)
df <- bind_rows(data.frame(ID="data.1", value=data.1), data.frame(ID="data.2", value=data.2))

If I have another vector (or any other data structure) that contains the name of the variables as a character string, how can I elegantly shorten the code? One time I would need to retrieve the entry as a character string (for ID) and the other time as the variable name (for value).

studies <- c("data.1", "data.2")

You can put them in a named list first, `df_list = list(data.frame(values = data.1), data.frame(values = data.2)); names(df_list) = studies; bind_rows(df_list, .id = "ID")`. But of course this is simpler if [you've been using lists from the start](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) rather than sequentially named objects. — Gregor Thomas, Nov 30 '20 at 13:59
Thank you both (@markus and @GregorThomas) for you answers. Both solutions work, but I will need some time to understand each one and if they solve my problem generally or are specific to the toy example I posted. — Mario Niepel, Nov 30 '20 at 14:15
I think that lists are the more intuitive way for me to go. Here's my solution: `studies <- c("data.1", "data.2"); data.1 <- c(1, 2); data.2 <- c(10, 20, 30); ls <- list(data.1, data.2); names(ls) <- studies; df <- data.frame(study=factor(), value=double()); for (i in 1:length(ls)) {df <- add_row(df, study=names(ls)[i], value=ls[[i]])}` — Mario Niepel, Nov 30 '20 at 15:14
One remaining question I have: Is the for loop the best way to cycle through the list to create the data frame? Or is there an even more elegant way to cycle through every element of the list? — Mario Niepel, Nov 30 '20 at 15:21

score 0 · Answer 1 · edited Nov 30 '20 at 14:17

0

you can define a function f, which will return a data.frame with the column ID as the variable name of the object you are passing into the function:

f <- function(x){
  return(data.frame(ID=deparse(substitute(x)), value=x))
}

So, you can define your new data.frame as follows:

require(dplyr)

data.1 <- c(1, 2)
data.2 <- c(10, 20, 30)

bind_rows(f(data.1), f(data.2))

It looks much more elegant to me because you don't need to write twice the name of the sources.

edited Nov 30 '20 at 14:17

Dharman

30,962
25
85
135

answered Nov 30 '20 at 14:11

manuzambo

191
7

Thank you Manuel. This does make it nicer, but is not really what I was looking for. I would still have to enumerate every variable (now in a function). The key advance I was looking for is how to loop efficiently through an object so this process is automated. I think I can figure this out using lists. – Mario Niepel Nov 30 '20 at 15:12

score 0 · Accepted Answer · answered Dec 01 '20 at 17:59

I think I found a general solution via lists that only uses R base functions and is generally pretty simple. The most complicated part is to maintain the names they have different lengths or if there are different numbers of values in each study that span more than one order of magnitude (will append different numbers of characters to the name during unlist.

Thank you @Gregor for pointing me toward working with lists.

# data input
    studies <- c("study1", "std2", "This name is very long")
    data_1 <- c(1, 2)
    data_2 <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
    data_3 <- c(3, 6, 9, 12, 15)
#generate list with data and assign names of idential length
    data_list <- list(data_1, data_2, data_3) 
    names(data_list) <- format(studies, width=max(nchar(studies)))
# get names from 'unlist', make them the same length, remove last chars 
    names <- names(unlist(data_list))
    names <- format(names, width=max(nchar(names)))
    names <- substr(names,1,nchar(names)-2)
# get values from 'unlist'
    values <- unname(unlist(data_list))
# make data.frame
    data <- data.frame(names, values)

Even shorter way to get the names from unlist to be the right length: `names <- substr(names(unlist(data_list)),1,max(nchar(studies)))` Replaces three lines with one and way less manipulations. — Mario Niepel, Dec 02 '20 at 02:32

Stacking two vectors into one column of data.frame with additional ID column

2 Answers2