0

I have a large dataframe in R. I want to merge/select values from a set of columns based on the value of another columns. I have a structure like this:

set.seed(45)
df <- data.frame(Other.Variables = sample(5),
                 A.1 = sample(5),
                 A.2 = sample(5),
                 B.1 = sample(5),
                 B.2 = sample(5),
                 C.1 = sample(5),
                 C.2 = sample(5),
                 Category = as.factor(c("A","B","A","C","B")))
df
  Other.Variables A.1 A.2 B.1 B.2 C.1 C.2 Category
1               4   2   2   3   1   1   2        A
2               2   1   4   4   2   4   5        B
3               1   5   1   1   4   3   4        A
4               3   4   3   5   3   2   3        C
5               5   3   5   2   5   5   1        B

Category is a factor with values (A,B,C,etc.).

My output should be

  Other.Variables  1  2 Category
1               4  2  2        A
2               2  4  2        B
3               1  5  1        A
4               3  2  3        C
5               5  2  5        B

with the values of either A.1, B.1, or C.1, etc. depending on which value Category has. The column names actually contain the names from Category before the dot.

I have been trying to do this with dplyr as I have most of my code using dplyr and pipe operators, but I am also open for other options.

[Edit] I have tried to give a more illustrative example of what kind of data I have. The problem is that Category has 20 different values and I have 15 Columns for each Category.

2 Answers2

0

I have actually found a dplyr solution for my own problem. The trick is to go from wide to long and back to wide format using some gather and spread with a filter at the end. Of course this could also be done with other reshaping functions outside the tydiverse.

Here is my solution:

df %>% 
+   gather(v, value, A.1:C.2) %>% 
+   separate(v, c("var", "col")) %>% 
+   spread(col, value) %>%
+   filter(Category == var) %>%
+   select(-var)

  Other.Variables Category 1 2
1               1        A 5 1
2               2        B 4 2
3               3        C 2 3
4               4        A 2 2
5               5        B 2 5

Maybe that helps someone with similar problems.

0

This can also be solved by using a combination of rowwise + get:

library(tidyverse)

df |>
rowwise() |>
mutate(`1` = get(str_glue("{Category}.1")),
       `2` = get(str_glue("{Category}.2")))
Johan
  • 810
  • 6
  • 12