0

I am working with the iris dataset, and manipulating it as follows to get a species, feature1, feature2, value data frame:

gatherpairs <- function(data, ..., 
                        xkey = '.xkey', xvalue = '.xvalue',
                        ykey = '.ykey', yvalue = '.yvalue',
                        na.rm = FALSE, convert = FALSE, factor_key = FALSE) {
  vars <- quos(...)
  xkey <- enquo(xkey)
  xvalue <- enquo(xvalue)
  ykey <- enquo(ykey)
  yvalue <- enquo(yvalue)

  data %>% {
    cbind(gather(., key = !!xkey, value = !!xvalue, !!!vars,
                 na.rm = na.rm, convert = convert, factor_key = factor_key),
          select(., !!!vars)) 
  } %>% gather(., key = !!ykey, value = !!yvalue, !!!vars,
               na.rm = na.rm, convert = convert, factor_key = factor_key)%>% 
    filter(!(.xkey == .ykey)) %>%
    mutate(var = apply(.[, c(".xkey", ".ykey")], 1, function(x) paste(sort(x), collapse = ""))) %>%
    arrange(var)
}

test = iris %>% 
         gatherpairs(sapply(colnames(iris[, -ncol(iris)]), eval))

This was taken from https://stackoverflow.com/a/47731111/8315659

What this does is give me that data frame with all combinations of feature1 and feature2, but I want to remove duplicates where it is just the reverse being shown. For example, Petal.Length vs Petal.Width is the same as Petal.Width vs Petal.Length. But if there are two rows with identical values for Petal.Length vs Petal.Width, I do not want to drop that row. Therefore, just dropping rows where all values are identical except that .xkey and .ykey are reversed is what I would want to do. Essentially, this is just to recreate the bottom triangle of the ggplot matrix shown in the above linked answer.

How can this be done? Jack

pogibas
  • 27,303
  • 19
  • 84
  • 117
Jack Arnestad
  • 1,845
  • 13
  • 26

1 Answers1

0

I think this could be accomplished using the first part of the source code, which performs a single gathering operation. Using the iris example, this will produce 600 rows of output, one for each of the 150 rows x 4 columns in iris.

gatherpairs <- function(data, ..., 
                        xkey = '.xkey', xvalue = '.xvalue',
                        ykey = '.ykey', yvalue = '.yvalue',
                        na.rm = FALSE, convert = FALSE, factor_key = FALSE) {
  vars <- quos(...)
  xkey <- enquo(xkey)
  xvalue <- enquo(xvalue)
  ykey <- enquo(ykey)
  yvalue <- enquo(yvalue)

  data %>% {
    cbind(gather(., key = !!xkey, value = !!xvalue, !!!vars,
                 na.rm = na.rm, convert = convert, factor_key = factor_key),
          select(., !!!vars)) 
  } # %>% gather(., key = !!ykey, value = !!yvalue, !!!vars,
    #            na.rm = na.rm, convert = convert, factor_key = factor_key)%>% 
    # filter(!(.xkey == .ykey)) %>%
    # mutate(var = apply(.[, c(".xkey", ".ykey")], 1, function(x) paste(sort(x), collapse = ""))) %>%
    # arrange(var)
}
Jon Spring
  • 55,165
  • 4
  • 35
  • 53