I am hoping to determine an efficient way to convert a list of data frames into a single data frame. Below is my reproducible MWE:
set.seed(1)
ABAge = runif(100)
ABPoints = rnorm(100)
ACAge = runif(100)
ACPoints = rnorm(100)
BCAge = runif(100)
BCPoints = rnorm(100)
A_B <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ABAge, Points = ABPoints)
A_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ACAge, Points = ACPoints)
B_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = BCAge, Points = BCPoints)
A_B$ID <- as.character(A_B$ID)
A_C$ID <- as.character(A_C$ID)
B_C$ID <- as.character(B_C$ID)
listFormat <- list("A_B" = A_B, "A_C" = A_C, "B_C" = B_C)
dfFormat <- data.frame(ID = as.character(paste0("ID", 1:100)), A_B.Age = ABAge, A_B.Points = ABPoints, A_C.Age = ACAge, A_C.Points = ACPoints, B_C.Age = BCAge, B_C.Points = BCPoints)
dfFormat$ID = as.character(dfFormat$ID)
This results in a data frame format (dfFormat
) that looks like this:
'data.frame': 100 obs. of 7 variables:
$ ID : chr "ID1" "ID2" "ID3" "ID4" ...
$ A_B.Age : num 0.266 0.372 0.573 0.908 0.202 ...
$ A_B.Points: num 0.398 -0.612 0.341 -1.129 1.433 ...
$ A_C.Age : num 0.6737 0.0949 0.4926 0.4616 0.3752 ...
$ A_C.Points: num 0.409 1.689 1.587 -0.331 -2.285 ...
$ B_C.Age : num 0.814 0.929 0.147 0.75 0.976 ...
$ B_C.Points: num 1.474 0.677 0.38 -0.193 1.578 ...
and a list of data frames listFormat
that looks like this:
List of 3
$ A_B:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
$ A_C:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
$ B_C:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...
I am hoping to come up with an automated way to convert the dfFormat
to listFormat
. As can be seen in the above objects there are two main conditions:
1) If there is a common column (name and contents) in each sublist of listFormat
(in these examples ID
), then they are not repeated in the outputted dfFormat
(in this example, it has one final ID
column),
2) The rest of the column names in sublists of listFormat
become columns in dfFormat
and have names such that they retain their sublist name (i.e "A_B") followed by a dot and then their original column name (i.e. Age), so that it becomes (i.e. "A_B.Age") in the dfFormat
.
I have tried various unlist()
and sapply
codes but have been unsuccessful thus far. What is an efficient way to accomplish this?