1

Below is what my data looks like. My goal is to find all rows that have matching x, z, and m, but different y. And I need to keep both, or all, of the ones that have the differing y.

How can I do that?

x <- c("A","B","C","B","D","E","E")
y <- c(0,10,10,10,10,12,0)
z <- c("A1","B1","B1","B1","B1","C1","C1")
m <- c(rep("2017-12-28",7))

df <- data.frame(x,y,z,m)
df

# Below is the goal
df[6:7,]
Indescribled
  • 320
  • 1
  • 10
  • 1
    What are you comparing to? What is a value of a row the same as or different from? – camille Nov 03 '21 at 00:08
  • Rows 6 and 7 shown in the goal are matching x, z, and m, but different y. So it is everything matching, but different y, as long as all different y rows are kept. – Indescribled Nov 03 '21 at 00:12
  • 1
    You still have not defined whatever you are calling a `match` – GuedesBF Nov 03 '21 at 00:13
  • @GuedesBF - I reckon it's `x/z/m` are used to define a group by their values, and then checking if there are multiple unique `y` inside that `x/z/m` group. – thelatemail Nov 03 '21 at 00:18
  • matching = identical. Row 1 can go because there is only one instance of that specific x, z, and m on that date, so there cannot be different y values. Row 2 can go because while x, z, and m are identical to row 4, y is the same. Row 4 goes for the same reason. Does that help? – Indescribled Nov 03 '21 at 00:18
  • 2
    Pretty sure this is a duplicate if something like `df[ave(df[["y"]], df[c("x","z","m")], FUN=\(x) length(unique(x)) ) > 1,]` does what you want. Maybe https://stackoverflow.com/questions/50953598/r-group-by-count-distinct-values-grouping-by-another-column essentially or https://stackoverflow.com/questions/12840294/counting-unique-distinct-values-by-group-in-a-data-frame? – thelatemail Nov 03 '21 at 00:20
  • That code gave me an error, but I can try something like `df %>% distinct() %>% count(x) %>% filter(n > 1)` But that does not actually get rid of the rows and just tells me one of the three variables to filter on to remove unwanted rows. So I have to save the output x values, repeat the code for z, and m, then filter and it seems convoluted. But it might work – Indescribled Nov 03 '21 at 00:31
  • 1
    The second question I linked would suggest - `df %>% group_by(x,z,m) %>% filter(n_distinct(y) > 1)` – thelatemail Nov 03 '21 at 00:46

1 Answers1

2

base R

df[ave(df$y, df[,c("x","z","m")], FUN = function(y) length(unique(y))) > 1,]
#   x  y  z          m
# 6 E 12 C1 2017-12-28
# 7 E  0 C1 2017-12-28

Note that due to the way ave coerces its return value to the same class as the first argument, if y is something other than numeric or integer, this may not work perfectly as desired.

Also, for code-golf or readability, you can replace the FUN= argument with one of FUN=dplyr::n_distinct or data.table::uniqueN, if you prefer an ave solution and yet have one of those packages loaded.

dplyr

library(dplyr)
df %>%
  group_by(x, z, m) %>%
  filter(n_distinct(y) > 1) %>%
  ungroup()
# # A tibble: 2 x 4
#   x         y z     m         
#   <chr> <dbl> <chr> <chr>     
# 1 E        12 C1    2017-12-28
# 2 E         0 C1    2017-12-28

data.table

library(data.table)
as.data.table(df)[, .SD[uniqueN(y) > 1,], by = .(x, z, m)]
#         x      z          m     y
#    <char> <char>     <char> <num>
# 1:      E     C1 2017-12-28    12
# 2:      E     C1 2017-12-28     0

Data

df <- structure(list(x = c("A", "B", "C", "B", "D", "E", "E"), y = c(0, 10, 10, 10, 10, 12, 0), z = c("A1", "B1", "B1", "B1", "B1", "C1", "C1"), m = c("2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28", "2017-12-28")), class = "data.frame", row.names = c(NA, -7L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 1
    Not sure the joining back is necessary, you could just `select` if the order is important - `df %>% group_by(x,z,m) %>% filter(n_distinct(y) > 1) %>% select(names(df))` – thelatemail Nov 03 '21 at 00:47
  • I was in the process of testing something like that when you commented, thanks for the final nudge in the right direction :-) – r2evans Nov 03 '21 at 00:48