Why can't I filter on this column?

Question

I have a dataframe with a lot of rows and columns.

> ncol(stackdf)
[1] 1999
> nrow(stackdf)
[1] 662630

I can see that this particular column has 313 1s and the rest of it is zeroes:

> stackdf[,"8470599.O"] %>% sort(decreasing = TRUE)                            
    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   [37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   [73] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [109] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [145] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [181] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [217] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [253] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [289] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
  [325] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [361] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [397] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [433] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [469] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [505] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Yet I can't filter for these values.

> stackdf %>% filter("8470599.O" == 1) %>% nrow()
[1] 0

> stackdf %>% filter("8470599.O" == 0) %>% nrow()
[1] 0

> stackdf %>% filter(is.na("8470599.O")) %>% nrow()
[1] 0

> stackdf %>% filter(!is.na("8470599.O")) %>% nrow()
[1] 662630

What's going on here? I assigned these 1s and 0s myself to this data. It's just numbers.

> typeof(stackdf[,"8470599.O"])
[1] "double"

These values look normal, I don't think they're weird characters or anything like that.

> stackdf %>% select("8470599.O") %>% pull %>% sort %>% unique
[1] 0 1

> str(stackdf[,"8470599.O"])
 num [1:662630] 1 0 0 0 0 0 0 0 0 0 ...

What's the issue here?

`"8470599.O"` is just a string. It will work in `[` because `[` expects quoted column names. `dplyr` verbs expect unquoted column names, you will need to use backticks for non-standard column names: `stackdf %>% filter(\`8470599.O\` == 1) %>% nrow()` — Gregor Thomas, Oct 26 '20 at 13:52
You can also use the `subset` function in base R: `subdf <- subset(stackdf, "8470599.0" == 1)` — SteveM, Oct 26 '20 at 14:01
I use variables to refer to column names and I don't see a way to get that to work? I also can't get the formatting to work in this comment, sorry about this... I'll use 'tick' to represent where a backtick is. > stackdf %>% filter(tick8470599.Otick == 1) %>% nrow() [1] 313 > colNam <- "8470599.O" > stackdf %>% filter(colNam == 1) %>% nrow() [1] 0 > stackdf %>% filter(tickcolNamtick == 1) %>% nrow() [1] 0 > stackdf %>% filter("colNam" == 1) %>% nrow() [1] 0 — Sebastian, Oct 26 '20 at 14:21
Here's a screenshot of what I'm trying to share https://i.imgur.com/pV8uBKs.png you see what I mean? — Sebastian, Oct 26 '20 at 14:23
Ah I think I got it, I have to use get(colNam). Wish this was more intuitive! If you submit your backticks comment as an answer I can accept it. — Sebastian, Oct 26 '20 at 15:57
If you're using variables with column names in dplyr, don't use `get` - use [this FAQ](https://stackoverflow.com/q/26003574/903061) or see the package vignette on [Programming with dplyr](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html). — Gregor Thomas, Oct 26 '20 at 18:04

Why can't I filter on this column?

0 Answers0