1

I am trying to merge two rows by a similar group which I did by looking at different questions on stack overflow (Question1, Qestion2, Question3). All these questions stated what I want but I also have some empty fields in my data frame and I don't want to merge them. I only want to merge the similar/duplicate rows based on Col1 that contain values and not empty or NA. I use below code but it also merges cells that are empty or NA.

merge_my_rows <- df %>%
  group_by(Col1) %>%
  summarise(Col2 = paste(Col2, collapse = ","))

Below please is the sample df and Output df that I want.

Col1 Col2
F212 ALICE
D23 John
C64 NA
F212 BOB
C64 NA
D23 JohnY
D19 Marquis

Output df

Col1 Col2
F212 ALICE, BOB
D23 John, JohnY
C64 NA
C64 NA
D19 Marquis
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Usman YousafZai
  • 1,088
  • 4
  • 18
  • 44

2 Answers2

1

You can set a new grouping column, na.grp, that gives each NA in Col2 a unique number and gives those non-NA elements a common number.

library(dplyr)

df %>%
  group_by(Col1, na.grp = ifelse(is.na(Col2), cumsum(is.na(Col2)), 0)) %>%
  summarise(Col2 = paste(Col2, collapse = ", "), .groups = "drop") %>%
  select(-na.grp)

# # A tibble: 5 × 2
#   Col1  Col2       
#   <chr> <chr>      
# 1 C64   NA         
# 2 C64   NA         
# 3 D19   Marquis    
# 4 D23   John, JohnY
# 5 F212  ALICE, BOB 

Data
df <- read.table(text = "
Col1  Col2
F212  ALICE
D23   John
C64   NA
F212  BOB
C64   NA
D23   JohnY
D19   Marquis", header = TRUE)
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
0

Using reframe

library(dplyr)
df1 %>% 
  reframe(Col2 = if(all(is.na(Col2))) Col2 else toString(Col2[!is.na(Col2)])
      , .by = "Col1")

-output

Col1        Col2
1 F212  ALICE, BOB
2  D23 John, JohnY
3  C64        <NA>
4  C64        <NA>
5  D19     Marquis
akrun
  • 874,273
  • 37
  • 540
  • 662