1

I find I often am comparing two character vectors to see where they don't match up (typically columns in two different data frames). Because I'm doing this often, I want to write a function to make it easier. This is what I've come up with so far:

x <- c("A", "B", "C")
y <- c("B", "C", "D", "X")


check_mismatch <- function(vec1, vec2) {
  vec1 <- unique(as.character(vec1))
  vec2 <- unique(as.character(vec2))
  missing_from_1 <- vec2[vec2 %notin% vec1]
  missing_from_2 <- vec1[vec1 %notin% vec2]
  print("Missing from vector 1")
  print(missing_from_1)
  print("Missing from vector 2")
  print(missing_from_2)
}

check_mismatch(x,y)


[1] "Missing from vector 1"
[1] "D" "X"
[1] "Missing from vector 2"
[1] "A"

What I would really like is "Missing from x" instead of "Missing from vector 1". I would like the function to output the name of the actual argument that was entered. Another example of how I would like the function to work:

check_mismatch(all_polygons_df$Plot, sb_year$Plot)

[1] "Missing from all_polygons_df$Plot"
[1] "KWI-1314B"
[1] "Missing from sb_year$Plot"
character(0)

Any suggestions on how I could do this? I'm open to other ways of displaying the output too - perhaps some kind of table. But the output needs to be flexible to different lengths of output.

canderson156
  • 1,045
  • 10
  • 24
  • 1
    This will most likely answer your question: https://stackoverflow.com/questions/10520772/in-r-how-to-get-an-objects-name-after-it-is-sent-to-a-function – Jan Jun 25 '20 at 19:27
  • (1) @Jan's comment is spot-on, use `deparse(substitute(vec1))`. (2) Your function prints everything but only *returns* the data from `missing_from_2`, rendering the function mostly useless other than in console side-effect. How about returning `list(missing_from_1, missing_from_2)`. (3) There's a slight difference (to me) between `print`ing your helper message `"Missing from"` and providing the actual data. `print` has a vector-smell to it, you might consider `message` or `cat` for presentation on the console (and then return a value that's meaningful). – r2evans Jun 25 '20 at 19:31
  • 1
    For more information about non standard evaluation, have a look at the metaprogramminc chapter in [Advanced R](https://adv-r.hadley.nz/metaprogramming.html) – starja Jun 25 '20 at 19:31

1 Answers1

3

Up front, deparse(substitute(...)) is what you're asking for, and that is what makes your initial question a duplicate.

Some recommendations, however:

  1. printing things to the console is a little off (IMO), since it prepends [1] to everything you print. Consider message (or cat). Since many R environments color things based on comments, etc, I have found it useful to prepend # before some text to break it out from other portions of the same text.

  2. Your function is operating solely in side-effect, printing something to the console and then losing it forever. The function does happen to return a single object (the value of missing_from_2, accidentally), but it might be more useful if the function returned the mismatches.

    With that, I offer an alternative:

    check_mismatch <- function(vec1, vec2) {
      nm1 <- deparse(substitute(vec1))
      nm2 <- deparse(substitute(vec2))
      vec1 <- unique(as.character(vec1))
      vec2 <- unique(as.character(vec2))
      missing_from_1 <- vec2[!vec2 %in% vec1]
      missing_from_2 <- vec1[!vec1 %in% vec2]
      setNames(list(missing_from_1, missing_from_2), c(nm1, nm2))
    }
    check_mismatch(x, y)
    # $x
    # [1] "D" "X"
    # $y
    # [1] "A"
    

    One immediate benefit is that we can look for specific differences in one of the vectors immediately:

    mis <- check_mismatch(x, y)
    mis$x
    # [1] "D" "X"
    

    However, this uses the names of the variables presented to it. Realize that with non-standard evaluation comes responsibility and consequence. Consider:

    mis <- check_mismatch(x, c("A", "B", "E"))
    mis
    # $x
    # [1] "E"
    # $`c("A", "B", "E")`
    # [1] "C"
    

    The name of the second element is atrocious. Fortunately, if all you care about is what the differences are for the second element, once can still use [[2]] to retrieve the character vector without issue. (This is mostly aesthetic.)

    mis[[2]]
    # [1] "C"
    
  3. Also, one might want to repeat this for more than two vectors, so generalizing it might be useful (for "1 or more"):

    check_mismatch_many <- function(...) {
      dots <- list(...)
      if (!length(dots)) {
        out <- list()
      } else {
        nms <- as.character(match.call()[-1])
        out <- lapply(seq_along(dots), function(i) {
          b <- unique(unlist(dots[-i]))
          b[!b %in% dots[[i]]]
        })
        out <- replace(out, sapply(out, is.null), list(dots[[1]][0]))
        names(out) <- nms
      }
      out
    }
    
    z <- c("Y","Z")
    check_mismatch_many()
    # list()
    check_mismatch_many(x)
    # $x
    # character(0)
    check_mismatch_many(x, y)
    # $x
    # [1] "D" "X"
    # $y
    # [1] "A"
    check_mismatch_many(x, y, z)
    # $x
    # [1] "D" "X" "Y" "Z"
    # $y
    # [1] "A" "Y" "Z"
    # $z
    # [1] "A" "B" "C" "D" "X"
    
  4. And finally, if you want to be a little "personal" with the presentation on the console, you can go overboard and class it with an additional print.myclass S3 method.

    check_mismatch_many <- function(...) {
      dots <- list(...)
      if (!length(dots)) {
        out <- list()
      } else {
        nms <- as.character(match.call()[-1])
        out <- lapply(seq_along(dots), function(i) {
          b <- unique(unlist(dots[-i]))
          b[!b %in% dots[[i]]]
        })
        out <- replace(out, sapply(out, is.null), list(dots[[1]][0]))
        names(out) <- nms
      }
      class(out) <- c("mismatch", "list")
      out
    }
    print.mismatch <- function(x, ...) {
      cat("<Mismatch>\n")
      cat(str(x, give.attr = FALSE, no.list = TRUE))
      invisible(x)
    }
    mis <- check_mismatch_many(x, y)
    mis
    # <Mismatch>
    #  $ x: chr [1:2] "D" "X"
    #  $ y: chr "A"
    

    (There are a lot more things you can do in the print.mismatch method, obviously. str is the major component of it, and it is the swiss-army-knife of depicting structure.)

r2evans
  • 141,215
  • 6
  • 77
  • 149