I have an extremely wide dataset that I am trying to unpivot to a degree but not completely. Essentially I am trying to group certain columns together based on a string before an underscore and pivot on those groups individually. My current method uses two opposite pivots, a for loop, and an intermediate list to accomplish my goal. I am able to get my final product but for my own knowledge, I am wondering if there is a more elegant solution. I realize that I am likely not explaining things well so I have recreated the scenario with a dummy dataset.
#Required packages
library(tidyverse)
#Dummy data
file <- as_tibble(data.frame(id = c("QQQ", "WWW", "EEE", "RRR", "TTT"),
state = c("aa", "bb", "cc", "dd", "ee"),
city = c("ff", "gg", "hh", "ii", "jj"),
a_1 = runif(5),
a_2 = runif(5),
a_3 = runif(5),
a_4 = runif(5),
a_5 = runif(5),
a_6 = runif(5),
a_7 = runif(5),
a_8 = runif(5),
a_9 = runif(5),
a_10 = runif(5),
b_1 = runif(5),
b_2 = runif(5),
b_3 = runif(5),
b_4 = runif(5),
b_5 = runif(5),
b_6 = runif(5),
b_7 = runif(5),
b_8 = runif(5),
b_9 = runif(5),
b_10 = runif(5),
c_1 = runif(5),
c_2 = runif(5),
c_3 = runif(5),
c_4 = runif(5),
c_5 = runif(5),
c_6 = runif(5),
c_7 = runif(5),
c_8 = runif(5),
c_9 = runif(5),
c_10 = runif(5)))
#My solution
longer <- file %>%
pivot_longer(cols = c(-id:-city),
names_to = c(".value", "section"),
names_pattern = "(.+)_([0-9]+$)"
)
num_letterGroup <- ncol(longer) - 4 #4 is the number of columns i want to retain
wide_list <- vector(mode = "list", length = num_letterGroup)
name_list <- vector(mode = "character", length = num_letterGroup)
for (i in 1:num_letterGroup) {
col_num <- 4 + i
col_name <- colnames(longer)[col_num]
wide <- longer %>%
select(1:4, all_of(col_name)) %>%
pivot_wider(names_from = section, values_from = col_name) %>%
mutate(letterGroup = col_name)
wide_list[[i]] <- wide
name_list[i] <- col_name
}
names(wide_list) <- name_list
wide_df <- bind_rows(wide_list)
I realize that the amount of data given might seem excessive but I needed the column numbers to be sequential as well as reach double digits. Thank you in advance for any assistance you can provide.
EDIT TO CLARIFY: wide_df is the final product that I want