how to merge data across data.frame rows based on identical column values

Question

How would you merge values between rows that have identical values in id_3? I'm sure there's a better name for the question title but I'm struggling to find the appropriate operation/function name(s) for this procedure.

library(tidyverse)
id_1 <- c("x12", NA, "a_bc", NA)
id_2 <- c(NA, "gye", NA, "ab_c")
id_3 <- c("qwe", "ert", "abc", "abc")
param_1 <- c(0.21, 1.5, 0.23, NA)
param_12 <- c(0.05, 4.4, NA, 6.3)

df <- data.frame(id_1, id_2, id_3, param_1, param_12)
as_tibble(df)

#    id_1  id_2  id_3  param_1 param_12
#   <fct> <fct> <fct>   <dbl>    <dbl>
#  1 x12   NA    qwe      0.21     0.05
#  2 NA    gye   ert      1.5      4.4 
#  3 a_bc  NA    abc      0.23     NA   
#  4 NA    ab_c  abc      NA       6.3

desired df:

#    id_1  id_2  id_3  param_1 param_12
#   <fct> <fct> <fct>   <dbl>    <dbl>
#  1 x12   NA    qwe     0.21     0.05
#  2 NA    gye   ert     1.5      4.4 
#  3 a_bc  ab_c  abc     0.23     6.3

update - with additional column formats such as character:

id_1 <- c("x12", NA, "a_bc", NA)
id_2 <- c(NA, "gye", NA, "ab_c")
id_3 <- c("qwe", "ert", "abc", "abc")
param_1 <- c(0.21, 1.5, 0.23, NA)
param_12 <- c(0.05, 4.4, NA, 6.3)
desc_1 <- c("st", NA, "ko", NA)
desc_2 <- c(NA, "lo", NA, "vf")

df <- data.frame(id_1, id_2, id_3, param_1, param_12, desc_1, desc_2)
df <- df %>% mutate(desc_1 = as.character(desc_1), 
                    desc_2 = as.character(desc_2))

 # A tibble: 4 x 7
 #  id_1  id_2  id_3  param_1 param_12 desc_1 desc_2
 #  <fct> <fct> <fct>   <dbl>    <dbl> <chr>  <chr> 
 #1 x12   NA    qwe      0.21     0.05 st     NA    
 #2 NA    gye   ert      1.5      4.4  NA     lo    
 #3 a_bc  NA    abc      0.23    NA    ko     NA    
 #4 NA    ab_c  abc     NA        6.3  NA     vf 

df <- df %>% group_by(id_3) %>% 
      summarise_all(list(~ if(all(is.na(.))) NA else .[!is.na(.)]

`Error: Column `desc_1` can't promote group 1 to character`

you need group_by() and summarise() from dplyr – Giovana Stein May 29 '19 at 18:47 — Giovana Stein, May 29 '19 at 18:47

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

We can group by 'id_3' and summarise all the columns to return NA if all the values in that particular column is NA for a group or else remove the NA and return the first non-NA element

library(tidyverse)
df %>% 
   group_by(id_3) %>% 
   summarise_all(list(~ if(all(is.na(.))) NA else .[!is.na(.)][1]))
# A tibble: 3 x 5
#  id_3  id_1  id_2  param_1 param_12
#  <fct> <fct> <fct>   <dbl>    <dbl>
#1 abc   a_bc  ab_c     0.23     6.3 
#2 ert   <NA>  gye      1.5      4.4 
#3 qwe   x12   <NA>     0.21     0.05

-Rstudio

Update

For the second dataset, after the columns were converted to character class, OP encountered error with if(all(is.na(.))) NAl. If the columns are of same type, we can specify NA_character, NA_real_, NA_integer_ to dispatch the correct NA for each type. Here, we cannot do that, but there is one hacky option to return the 1st NA element in that column which will also have the correct type

df %>% 
   group_by(id_3) %>% 
   summarise_all(list(~ if(all(is.na(.))) .[!is.na(.)][1] 
           else .[!is.na(.)]))
# A tibble: 3 x 7
#  id_3  id_1  id_2  param_1 param_12 desc_1 desc_2
#  <fct> <fct> <fct>   <dbl>    <dbl> <chr>  <chr> 
#1 abc   a_bc  ab_c     0.23     6.3  ko     vf    
#2 ert   <NA>  gye      1.5      4.4  <NA>   lo    
#3 qwe   x12   <NA>     0.21     0.05 st     <NA>

edited Jun 20 '20 at 09:12

Community

1
1

answered May 29 '19 at 18:46

akrun

874,273
37
540
662

I think there may be a missing paranthesis (RStudio saying unexpected token/unmatched bracket), but I'm unable to edit myself. – dbo May 29 '19 at 19:47
@doconnor I checked by copy/pasting the solution. It is working for me though (in R console) – akrun May 29 '19 at 19:49
@doconnor I am using `packageVersion('dplyr')# [1] ‘0.8.1’` on `R 3.6.0` – akrun May 29 '19 at 19:52
it runs fine in RStudio but I get the red debug Xs - I'm using the same `dplyr` version but `R 3.5.1`. Once I can get admin permissions I'll update and retry. – dbo May 29 '19 at 20:08
@doconnor It could be that issue. I recently updated from 3.5.2 to 3.6 and there are some changes here and there – akrun May 29 '19 at 20:09
for the record - I updated to `3.6.0` and get the same thing - code runs but RStudio wants another paranthesis somehow. – dbo May 29 '19 at 22:07
@doconnor Let me check in Rstudio – akrun May 29 '19 at 22:11
@doconnor You can see that it is working for me in rstudio version `Version 1.2.1522` – akrun May 29 '19 at 22:13
1

@doconnor I think I downloaded the preview version to be in the bleeding edge – akrun May 29 '19 at 22:22
if it's an easy guess @akrun - if there were more columns with other formats, I can't see why your block wouldn't work? Trying on my non-example data.frame with additional character and date_time columns, I get `Error: Column 'my_column' can't promote group 1 to character` – dbo May 29 '19 at 22:42
@doconnor Can you please update your post with an example so that I can test it – akrun May 29 '19 at 22:43
just updated above. it works if I leave my descriptive columns like `desc_1` and `desc_2` as factors before running @akrun's block, so for my non example data.frame columns that are in character format to begin with, it seems I could `as.factor()` them and then run @akrun's block, unless there is a cleaner way. – dbo May 29 '19 at 23:14
@doconnor I tried your example without converting to `character` and it is working fine. Did you meant that you want to convert to `character` class – akrun May 29 '19 at 23:21
still having some trouble on my non-example data, I posted a follow up example here - https://stackoverflow.com/questions/56382379/how-to-merge-data-across-data-frame-rows-based-on-identical-column-values-with – dbo May 30 '19 at 16:49

how to merge data across data.frame rows based on identical column values

1 Answers1

Update

Linked