0

I have a dataframe with a lot of rows and columns.

> ncol(stackdf)
[1] 1999
> nrow(stackdf)
[1] 662630

I can see that this particular column has 313 1s and the rest of it is zeroes:

> stackdf[,"8470599.O"] %>% sort(decreasing = TRUE)                            
    [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   [37] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   [73] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [109] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [145] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [181] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [217] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [253] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
  [289] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
  [325] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [361] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [397] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [433] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [469] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  [505] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Yet I can't filter for these values.

> stackdf %>% filter("8470599.O" == 1) %>% nrow()
[1] 0

> stackdf %>% filter("8470599.O" == 0) %>% nrow()
[1] 0

> stackdf %>% filter(is.na("8470599.O")) %>% nrow()
[1] 0

> stackdf %>% filter(!is.na("8470599.O")) %>% nrow()
[1] 662630

What's going on here? I assigned these 1s and 0s myself to this data. It's just numbers.

> typeof(stackdf[,"8470599.O"])
[1] "double"

These values look normal, I don't think they're weird characters or anything like that.

> stackdf %>% select("8470599.O") %>% pull %>% sort %>% unique
[1] 0 1

> str(stackdf[,"8470599.O"])
 num [1:662630] 1 0 0 0 0 0 0 0 0 0 ...

What's the issue here?

Sebastian
  • 123
  • 1
  • 13
  • `"8470599.O"` is just a string. It will work in `[` because `[` expects quoted column names. `dplyr` verbs expect unquoted column names, you will need to use backticks for non-standard column names: `stackdf %>% filter(\`8470599.O\` == 1) %>% nrow()` – Gregor Thomas Oct 26 '20 at 13:52
  • You can also use the `subset` function in base R: `subdf <- subset(stackdf, "8470599.0" == 1)` – SteveM Oct 26 '20 at 14:01
  • Thank you. It's the backticks that fix it. – Sebastian Oct 26 '20 at 14:18
  • I use variables to refer to column names and I don't see a way to get that to work? I also can't get the formatting to work in this comment, sorry about this... I'll use 'tick' to represent where a backtick is. > stackdf %>% filter(tick8470599.Otick == 1) %>% nrow() [1] 313 > colNam <- "8470599.O" > stackdf %>% filter(colNam == 1) %>% nrow() [1] 0 > stackdf %>% filter(tickcolNamtick == 1) %>% nrow() [1] 0 > stackdf %>% filter("colNam" == 1) %>% nrow() [1] 0 – Sebastian Oct 26 '20 at 14:21
  • Here's a screenshot of what I'm trying to share https://i.imgur.com/pV8uBKs.png you see what I mean? – Sebastian Oct 26 '20 at 14:23
  • Ah I think I got it, I have to use get(colNam). Wish this was more intuitive! If you submit your backticks comment as an answer I can accept it. – Sebastian Oct 26 '20 at 15:57
  • 1
    If you're using variables with column names in dplyr, don't use `get` - use [this FAQ](https://stackoverflow.com/q/26003574/903061) or see the package vignette on [Programming with dplyr](https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html). – Gregor Thomas Oct 26 '20 at 18:04

0 Answers0