The source of the error is that you are always comparing against nrow(df)
rows regardless of how many rows hit the second filter
. For instance:
dat <- data.frame(a=1:10)
dat %>% filter(a > 5)
# a
# 1 6
# 2 7
# 3 8
# 4 9
# 5 10
The way you're writing it, you're doing
dat %>% filter(dat[,1] > 5)
# a
# 1 6
# 2 7
# 3 8
# 4 9
# 5 10
For this first call, the number of rows that go into filter
is 10, and the number of rows being compared inside filter
is also 10. However, if you were to do:
dat %>% filter(dat[,1] > 5) %>% filter(dat[,1] > 7)
# Error in filter_impl(.data, quo) : Result must have length 5, not 10
this fails because the number of rows going into the second filter
is only 5 not 10, though we are giving the filter
command 10 comparisons by using dat[,1]
.
(N.B.: many comments about names are perfectly appropriate, but let's continue with the theme of using column indices.)
The first trick is to give each filter
only as many comparisons as the data coming in. Another way to say this is to do comparisons on the state of the data at that point in time. magrittr
(and therefore dplyr
) do this with the .
placeholder. The dot is always able to be inferred (defaulting to the first argument of the RHS function, the function after %>%
), but some feel that being explicit is better. For instance, this is legal:
mtcars %>%
group_by(cyl) %>%
tally()
# # A tibble: 3 x 2
# cyl n
# <dbl> <int>
# 1 4 11
# 2 6 7
# 3 8 14
but an explicit equivalent pipe is this:
mtcars %>%
group_by(., cyl) %>%
tally(.)
If the first argument to the function is not the frame itself, then the %>%
inferred way will fail:
mtcars %>%
xtabs(~ cyl + vs)
# Error in as.data.frame.default(data, optional = TRUE) :
# cannot coerce class '"formula"' to a data.frame
(Because it is effectively calling xtabs(., ~cyl + vs)
, and without named arguments then xtabs
assumed the first argument to be a formula
.)
so we must be explicit in these situations:
mtcars %>%
xtabs(~ cyl + vs, data = .)
# vs
# cyl 0 1
# 4 1 10
# 6 3 4
# 8 14 0
(contrived example, granted). One could also do mtcars %>% xtabs(formula=~cyl+vs)
, but my points stands.
So to adapt your code, I would expect this to work:
df %>%
filter(!.[,1] %in% stripcols) %>%
filter(!.[,2] %in% stripcols)
I think I'd prefer the [[
approach (partly because I know that tbl_df
and data.frame
deal with [,1]
slightly differently ... and though it works with it, I still prefer the explicitness of [[
):
df %>%
filter(!.[[1]] %in% stripcols) %>%
filter(!.[[2]] %in% stripcols)
which should work. Of course, combining works just fine, too:
df %>%
filter(!.[[1]] %in% stripcols, !.[[2]] %in% stripcols)