-1

Using this R function I am able to see the matching ratio for row clusters:

# Matching ratio function
match_ratio <- function(x)
  cbind(x, match_ratio = rowMeans(mapply(`==`, x[1, -1], x[, -1])))

I would like to also label each cell value as True if they match in the row clusters and False if they do not. Any suggestions? Thanks.

Sample input is

ID  Var1  Var2
1   East  Juice
1   East  Soda
2   West  Apple
2   East  Apple  

Sample Output would be

ID  Var1   Var2
1   True   False
1   True   False
2   False  True
2   False  True

So the clusters are based in ID. 1 is a cluster as is 2.

shj997
  • 29
  • 1
  • 6
  • What row clusters are you talking about? You should include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output. – MrFlick Mar 23 '17 at 18:16
  • Please see edit. Thanks. – shj997 Mar 23 '17 at 18:57

1 Answers1

0

a dplyr method:

library(dplyr)

df %>% group_by(ID) %>% mutate_all(funs(n_distinct(.) == 1))

     ID  Var1  Var2
  <int> <lgl> <lgl>
1     1  TRUE FALSE
2     1  TRUE FALSE
3     2 FALSE  TRUE
4     2 FALSE  TRUE
Nate
  • 10,361
  • 3
  • 33
  • 40
  • Thanks! This works perfectly. Is there also away to do partial matches? Say if the cells being compared shared 80% of the letters. Thanks again. – shj997 Mar 23 '17 at 21:10
  • the strategy for that depends on what your "fuzzy" matches are. It's likely easier to do this type of data "cleaning" before you get to the tabulating steps – Nate Mar 24 '17 at 13:05