You have a lot of what I might call metadata encoded in those column names that you want to use to match them. Therefore I suggest using tidyr::pivot_longer()
to get that metadata out of column names and into the actual data.frame
and then parse it out so you can match the each record with the sum you want to divide by.
I used tidyr::unite()
to restore your original column names and tidyr::pivot_wider()
to put it back into 'wide' format.
Note on the first separate()
call I used a regex
that splits on the last occurrence of _
since some of them have the word sum
and some don't.
library(tidyverse)
df <- data.frame(n_val1 =c(1,2,1,1), n_val2 = c(2,3,5,1), p_val1 = c(2,4,5,1), p_val2 = c(2,3,1,1), n_sum_val1 = c(5,4,2,4), n_sum_val2 = c(6,7,8,9), p_sum_val1 = c(5, 9, 1, 3), p_sum_val2 = c(6,8,2,4) )
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id) %>%
separate(name, into = c("a", "c"), sep = "_(?!(?s:.*)_)") %>%
separate(a, into = c("a", "b"), sep = "_") %>%
separate(c, into = c("c", "d"), sep = 3) %>%
mutate(b = case_when(
is.na(b) ~ "numerator",
T ~ "denominator")) %>%
pivot_wider(names_from = b, values_from = value) %>%
mutate(percent = numerator/denominator) %>%
pivot_longer(numerator:denominator, names_to = "b") %>%
mutate(b = case_when(b == "denominator" ~ "sum")) %>%
unite("name", c(a, b, c, d), na.rm = T) %>%
pivot_wider(values_from = c(value, percent), names_from = name, names_glue = "{name}_{.value}")
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 16 rows [1, 2, 3,
#> 4, 9, 10, 11, 12, 17, 18, 19, 20, 25, 26, 27, 28].
#> # A tibble: 4 x 17
#> id n_val_1_value n_sum_val_1_value n_val_2_value n_sum_val_2_value
#> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 5 2 6
#> 2 2 2 4 3 7
#> 3 3 1 2 5 8
#> 4 4 1 4 1 9
#> # ... with 12 more variables: p_val_1_value <dbl>, p_sum_val_1_value <dbl>,
#> # p_val_2_value <dbl>, p_sum_val_2_value <dbl>, n_val_1_percent <dbl>,
#> # n_sum_val_1_percent <dbl>, n_val_2_percent <dbl>,
#> # n_sum_val_2_percent <dbl>, p_val_1_percent <dbl>,
#> # p_sum_val_1_percent <dbl>, p_val_2_percent <dbl>, p_sum_val_2_percent <dbl>
Created on 2022-03-10 by the reprex package (v2.0.1)