2

The setdiff() function in dplyr is behaving differently when I execute it interactively vs in a .Rmd document with knitr. Specifically,

who_comp <- who %>% complete(country, year)
imp_miss <- dplyr::setdiff(union(who, who_comp), intersect(who, who_comp))

gives a dataframe/tibble interactively, but a list with knitr. Wrapping setdiff() with as.tibble() gives an error because the hypothetical tibble doesn't have any column names.

Why is it doing this, and is there a way to make it stop? It's the only function I've encountered that shows this difference.

Frank
  • 66,179
  • 8
  • 96
  • 180
MissMonicaE
  • 709
  • 1
  • 8
  • 15
  • 3
    Shouldn't you use `dplyr::union` and `dplyr::intersect` as well? – Frank Nov 14 '17 at 18:22
  • 1
    @Frank Thank you! That seems to fix it--I guess it was using `base::union()` , which gives a list. Do you know why it does that? Why would a list be a "common mode" for two tibbles? Strangely, `base::intersect(who, who_comp)` gives an empty dataframe while `dplyr::intersect(who, who_comp) gives a 7240x60 dataframe. – MissMonicaE Nov 14 '17 at 18:35
  • 1
    The `?sets` functions from base were designed only for simple vectors and were not made to apply differently depending on input. A tibble is a vector of columns, so maybe the base union function tries to give you the union of columns, etc. – Frank Nov 14 '17 at 18:41
  • Do you need a `library("dplyr")` at the top of your markdown document? – Lionel Henry Nov 14 '17 at 18:53
  • Generally, you can catch these things by seeing the messages shown after loading the library, messages like "The following object is masked"... I'm not sure if there's a good way to be aware of them without doing that, though. – Frank Nov 14 '17 at 18:54
  • 1
    @Frank See `?conflicts` for things that are loaded. – lmo Nov 14 '17 at 18:59
  • I'm not convinced this is a duplicate--neither of the linked questions addresses why knitr defaults to `base` even though interactive sessions default to `dplyr` (which is listed first in `search()`). – MissMonicaE Nov 14 '17 at 20:22
  • In your interactive session, you used `library(dplyr)` or `library(tidyverse)` so the dplyr versions are on path, right? – Frank Nov 15 '17 at 00:06
  • @Frank Yes, and I have them in the setup chunk of my .Rmd as well. – MissMonicaE Nov 15 '17 at 13:42
  • 2
    Hm, odd. If you can post a small reproducible rmd, I'd be happy to reopen. – Frank Nov 15 '17 at 14:02
  • 1
    Random thought: can you see if the problem goes away if you do the `union()` and `intersect()` calls at the top-level, rather than inside `setdiff()` call? – hadley Nov 15 '17 at 14:11

0 Answers0