in R choosing specific values from two rows sharing unique ids but keeping only one row

Question

In R: For a given data frame dummy that looks like:

dummy <- data.frame(a = c("b01", "b01", "b02"), 
                    id = c(456, 456, 233),
                    id2 = c(888, 888, 889), 
                    t = c("neg", "no", "pos"),
                    j = c("no", "no", "no"), 
                    y = c("pos", "no", "neg"),  
                    q = c("pos", "no", "no"),  
                    w = c("asd", "asd", "sdf"))

#     a  id id2   t   j   y   q   w
# 1 b01 456 888 neg  no  no pos asd   
# 2 b01 456 888  no  no pos  no asd
# 3 b02 233 889 pos  no neg  no sdf

I want to merge the rows by columns a, id, and id2 but I only want to keep the corresponding neg or pos, when they appear in either of the rows, and no if both are no.

I've tried:

library(dplyr)
z <- dummy %>%  
     group_by(a, id, id2) %>%  
     summarise(  
          t = paste(t, collapse = "-"),  
          j = paste(j, collapse = "-"),  
          y = paste(y, collapse = "-"),  
          q = paste(q, collapse = "-")

And it will do it (after removing unwanted text with gsub) but then column w is dropped..

The desired data frame would look like this:

#     a  id id2   t   j   y   q   w
# 1 b01 456 888 neg  no pos pos asd  
# 3 b02 233 889 pos  no neg  no sdf

Any help would be appreciated.
I've also looked at:
(Collapse text by group in data frame) and (dplyr summarise: Equivalent of ".drop=FALSE" to keep groups with zero length in output)

score 1 · Accepted Answer · answered May 31 '16 at 16:10

Here's a modification of your original approach using summarise() instead of mutate() so you can do it all in one step.

z <- dummy %>%  
     group_by(a, id, id2) %>%  
     summarise(t = ifelse(length(unique(t))==1, as.character(unique(t)), as.character(t[which(t!="no")])),
               j = ifelse(length(unique(j))==1, as.character(unique(j)), as.character(j[which(j!="no")])),
               y = ifelse(length(unique(y))==1, as.character(unique(y)), as.character(y[which(y!="no")])),
               q = ifelse(length(unique(q))==1, as.character(unique(q)), as.character(q[which(q!="no")])),
               w = unique(w))

Result:

> z
Source: local data frame [2 x 8]
Groups: a, id [?]

       a    id   id2     t     j     y     q      w
  (fctr) (dbl) (dbl) (chr) (chr) (chr) (chr) (fctr)
1    b01   456   888   neg   neg   pos   pos    asd
2    b02   233   889   pos    no   neg    no    sdf

If the variables in your data frame are strings instead of factors, you can get rid of all the as.character() bits in that code.

Thank you! I tried it but the `j` variable should throw a `no` instead of a `neg` for `a==b01`. Maybe I'm missing something? — Dania, May 31 '16 at 16:33
Your sample answer shows a "no" there, but if you apply your rules to the version of `dummy` your code makes, you should get a "neg" there, because `j=="neg"` for one of the rows where `a==b01 & id==456 & id2==888`. — ulfelder, May 31 '16 at 16:58

score 0 · Answer 2 · answered May 31 '16 at 15:53

0

Got it!
Instead of summarise(), I used mutate() and all worked fine.
Thanks everyone!

answered May 31 '16 at 15:53

Dania

305
2
10

in R choosing specific values from two rows sharing unique ids but keeping only one row

2 Answers2