Aggregate rows with multiple conditions

Question

I have a df as below

df <- data.frame(col1 = c("a", "a", "b",
                          "b", "c", "c"),
                 col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
                 col3 = c(1, NA, 2, NA, 3, NA),
                 col4 = c(NA, 1, NA, 2, NA, 3))
df
  col1 col2 col3 col4
1    a   x1    1   NA
2    a x1.1   NA    1
3    b   x2    2   NA
4    b x2.1   NA    2
5    c   x3    3   NA
6    c x3.1   NA    3

I want to merge rows that have the same letter in column col1 and filter rows in column col2 by telling them col2 %in% c(x1,x1.1) & col2 %in% c(x2,x2.1) & col3 %in% (x3,x3.1), simulatenously.

My desired output would be:

  col1 col2 col3 col4
1    a   x1    1   1
2    b   x2    2   2
3    c   x3    3   3

One solution from my side is to call if x == "x1", then col4 will be filled by values assosicated with x == "x1.1"

Any suggestions for this to group_by col1? Thank you in advance!

Additional note

I did draw a pic for visualization, that I think would be easier for you to imagine.

The values in the actual dataset are different from the example here.

Updated solution I found a solution with the help of akrun see here: Use ~separate after mutate and across

df |> 
  mutate(col2 = substring(col2, 1,2)) |> 
  mutate_if(is.numeric, ~replace(., is.na(.), "")) |> 
  group_by(col1, col2) |> 
  summarise(across(c(col3, col4), ~toString(.)), .groups = "drop") |> 
  mutate(col3 = str_remove(col3, ",")) |> 
  mutate(col4 = str_remove(col4, ", "))

I'm still open to further suggestions, if anyone has any.

score 1 · Answer 1 · answered Nov 29 '22 at 14:17

1

Not sure if I'm missing something, but for your example data the approach below works:

library(dplyr)

df %>% 
  mutate(col2 = sub("\\.[0-9]+$", "", col2),
         col3 = coalesce(col3, col4),
         col4 = coalesce(col4, col3)) %>% 
  distinct()
#>   col1 col2 col3 col4
#> 1    a   x1    1    1
#> 2    b   x2    2    2
#> 3    c   x3    3    3

^{Created on 2022-11-29 by the reprex package (v2.0.1)}

answered Nov 29 '22 at 14:17

TimTeaFan

17,549
4
18
39

I think I confused you. Let's me edit my question. The output is what exactly I need but like my expectation. – Anh Nov 30 '22 at 08:38

score 0 · Answer 2 · answered Nov 29 '22 at 14:10

After extracting only the first to characters of col2, you can use coalesce to bring col3 and col4 together. Then replacing NA by data from the coalesced column and using distinct at the end will lead to the desired output.

library(tidyverse)

df <- data.frame(col1 = c("a", "a", "b",
                          "b", "c", "c"),
                 col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
                 col3 = c(1, NA, 2, NA, 3, NA),
                 col4 = c(NA, 1, NA, 2, NA, 3))


df |> 
  mutate(col2 = substring(col2, 1,2)) |> 
  mutate(mycol = coalesce(col3, col4)) |>
  mutate(col3 = if_else(is.na(col3), mycol, col3),
         col4 = ifelse(is.na(col4), mycol, col4)) |> 
  select(-c(mycol)) |> 
  distinct()
#>   col1 col2 col3 col4
#> 1    a   x1    1    1
#> 2    b   x2    2    2
#> 3    c   x3    3    3

Let's me edit my question. Something is missing here – Anh Nov 30 '22 at 08:38 — Anh, Nov 30 '22 at 08:38

Aggregate rows with multiple conditions

2 Answers2