1

Is there a tidyverse way to combine rows of the same group replacing certain values:

I don't want a pivot solution!

This is my dataframe:

df <- structure(list(A = c(1L, 1L, 2L, 3L, 4L, 5L), B = c("a", "a", 
"b", "c", "d", "e"), C = c("u", "t", "t", "u", "t", "t"), D = c("t", 
"u", "u", "t", "u", "u"), E = c("t", "t", "u", "u", "u", "u")), 
class = "data.frame", row.names = c(NA, -6L))

  A B C D E
1 1 a u t t
2 1 a t u t
3 2 b t u u
4 3 c u t u
5 4 d t u u
6 5 e t u u

My desired output:

  A B C D E
1 1 a u u t
2 2 b t u u
3 3 c u t u
4 4 d t u u
5 5 e t u u

enter image description here

Row 1 and 2 have the same group 1 and a (Column A and B) ->

This group should be combined to one row 1 a replacing t by u in column C to E

Lessons studied:

combine rows in data frame containing NA to make complete row

Merging two rows with some having missing values in R

TarJae
  • 72,363
  • 6
  • 19
  • 66
  • What's the logic exactly? "u" always overrides "t"? Is that the only rule, or do you need to be able to scale larger than this? And why can't you pivot? – camille Sep 15 '21 at 18:55
  • The logic is to control the overriding parameter. – TarJae Sep 15 '21 at 18:57
  • @camille. Could you please explain why to remove the tidyverse tag. Many thanks! – TarJae Sep 15 '21 at 18:58
  • From [tag:tidyverse] tag info: "DO NOT USE if your question relates to one or two components of the tidyverse, such as dplyr or ggplot2. Use *those* tags, and tag with `r` as well for a better response. Unless your question is about the entirety of the tidyverse package, its installation or its integration with your system, use tags for the packages you are actually using. Using library(tidyverse) is rarely a minimal reproducible example when only library(dplyr) is required." – camille Sep 15 '21 at 20:20
  • Thank you very much for your explanation. I am aware of this information. I faced the following problem: In my solution all packages that are part of the `tidyverse` may be potentially needed: In this case, we may make use of `dplyr`, `tidyr`, `readr`, and `purrr` at least, maybe with some additional helper functions of `stringr` or `forecast`. This is why I use `tidyverse`. Basically I have no problem to remove. But I struggled a lot of times with the task described in my question. And to the best of my knowledge for rows there is not a neat solution...like the use of `coalesce` for columns. – TarJae Sep 15 '21 at 20:41
  • The tidyverse is 29 packages (and counting), most of which are unrelated to the question, especially since you said you don't want to pivot. The answer you accepted only uses 1 of them. I'm still unclear as to what the logic is exactly beyond this one group—you said to control the overriding parameter, but how is that defined? – camille Sep 15 '21 at 21:21

1 Answers1

3

We can group by 'A', 'B', summarise across the columns, order the values so that 'u' will return before other values and select the first element

library(dplyr)
df %>%
    group_by(A, B) %>% 
    summarise(across(everything(), 
       ~ first(.[order(. != 'u')])), .groups = 'drop')

-output

# A tibble: 5 x 5
      A B     C     D     E    
  <int> <chr> <chr> <chr> <chr>
1     1 a     u     u     t    
2     2 b     t     u     u    
3     3 c     u     t     u    
4     4 d     t     u     u    
5     5 e     t     u     u    
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Master this works perfect. See here . It is interesting that there is no function for combining two rows and controlling what to replace! Many thanks!. – TarJae Sep 15 '21 at 19:08