I am hoping to determine an efficient way to convert a single data frame into a list of data frames. Below is my reproducible MWE:
set.seed(1)
ABAge = runif(100)
ABPoints = rnorm(100)
ACAge = runif(100)
ACPoints = rnorm(100)
BCAge = runif(100)
BCPoints = rnorm(100)
A_B <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ABAge, Points = ABPoints)
A_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ACAge, Points = ACPoints)
B_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = BCAge, Points = BCPoints)
A_B$ID <- as.character(A_B$ID)
A_C$ID <- as.character(A_C$ID)
B_C$ID <- as.character(B_C$ID)
listFormat <- list("A_B" = A_B, "A_C" = A_C, "B_C" = B_C)
dfFormat <- data.frame(ID = as.character(paste0("ID", 1:100)), A_B.Age = ABAge, A_B.Points = ABPoints, A_C.Age = ACAge, A_C.Points = ACPoints, B_C.Age = BCAge, B_C.Points = BCPoints)
dfFormat$ID = as.character(dfFormat$ID)
This results in a data frame format (dfFormat
) that looks like this:
'data.frame': 100 obs. of 7 variables:
$ ID : chr "ID1" "ID2" "ID3" "ID4" ...
$ A_B.Age : num 0.266 0.372 0.573 0.908 0.202 ...
$ A_B.Points: num 0.398 -0.612 0.341 -1.129 1.433 ...
$ A_C.Age : num 0.6737 0.0949 0.4926 0.4616 0.3752 ...
$ A_C.Points: num 0.409 1.689 1.587 -0.331 -2.285 ...
$ B_C.Age : num 0.814 0.929 0.147 0.75 0.976 ...
$ B_C.Points: num 1.474 0.677 0.38 -0.193 1.578 ...
and a list of data frames listFormat
that looks like this:
List of 3
$ A_B:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
$ A_C:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
$ B_C:'data.frame': 100 obs. of 3 variables:
..$ ID : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
..$ Age : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...
I am hoping to come up with an automated way to convert the dfFormat
to listFormat
. As can be seen in the above objects there are two main conditions:
The column
ID
is always the first column indfFormat
and it is always the first column in each sublist oflistFormat
.The number of sublists is equal to the number of unique column names in
dfFormat
before the underscore ('_'). In this case, that is three prefixes (ex. "A_B", "A_C", and "B_C"). These prefixes are also the names of the three sublists.Within each sublist, it contains the number of columns that had the associated prefix ("A_B"). For each sublist, this was two ("Age" and "Points"). These suffixes are the names of the columns.
I asked the reverse question here (i.e. how to go from listFormat
to dfFormat
) and got some helpful answers that I am learning from. I need to have code to reverse both directions and it seems the reverse direction may need new types of code. I put my attempt below to show how I am stuck!
conUnd <- which(sapply(colnames(dfFormat), function(x) grepl("_", x)))
listName <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1])
uListName <- unique(sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1]))
listCol <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][2])
listFormat = list()
for (i in 1:length(uListName)){
[Gets messy here trying to define column names based on string variables]
}
Any advice would be greatly appreciated. I know my code is not efficient.