0

I want to filter specific rows from my dataset and I want to define which row that is before the filter function, but whenever I do that I get 0 observations.

I want to do this (and this doesn't work since I get 0 observations):

name <- "my_dna_42_x"
gene <- "my_gene_12213"
df2 <- df1 %>% group_by(DNA, ID) %>% filter(any(DNA == name && ID == gene)) 

But this does work (but I don't want this since I want to be able to define name and gene before running it (and make it into a function later)):

df2 <- df1 %>% group_by(DNA, ID) %>% filter(any(DNA == "my_dna_42_x" && ID == "my_gene_12213")) 

So how can I get the filter function to accept the name or ID while defining the names earlier?

(I also tried parse_expr(paste(name)) but that didn't work as well and I defined name as a symbol then like this: name <- sym("my_dna_42_x") )

SOLVED: name was already a column name

QUESTION WITH EXAMPLE DATA:

set.seed(42) 
n <- 6
dat <- data.frame(id=1:n, 
                  date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
                  group=rep(LETTERS[1:2], n/2),
                  age=sample(18:30, n, replace=TRUE),
                  type=factor(paste("type", 1:n)),
                  x=rnorm(n))
my_type <- "type 1"
filtered_dat <- dat %>% group_by(id, type) %>% filter(type == "my_type")

I have an issue with defining my_type (last 2 lines) and calling it again.

  • 2
    Vectorized! Use `&` instead of `&&`. – r2evans Apr 28 '21 at 13:46
  • 1
    Please read https://stackoverflow.com/q/6558921/3358272 (and https://stackoverflow.com/q/16027840/3358272), this is most likely an issue with vectorized logical conditioning. (Beyond that, there's little we can do without sample data, please read https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info for tips on how to make a question *reproducible*.) – r2evans Apr 28 '21 at 13:48

1 Answers1

1

This is likely a dupe (of the link I provided in my comment), but for your case:

name <- "my_dna_42_x"
gene <- "my_gene_12213"
df2 <- df1 %>%
  group_by(DNA, ID) %>%
  filter(any(DNA == name & ID == gene))
###                      ^--- single '&'

See the difference between

c(TRUE, TRUE) && c(TRUE, FALSE)
# [1] TRUE
c(TRUE, TRUE) & c(TRUE, FALSE)
# [1]  TRUE FALSE
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • That doesn't work, maybe I'm defining the name and gene wrong? I wrote ```name <- sym("some_name")``` and the code does work when I write ```df2 <- df1 %>% group_by(DNA, ID) %>% filter(any(DNA == "some_name" & ID == "some_gene"))``` – Luckystrikerr Apr 28 '21 at 14:04
  • You don't need to use `sym(.)` here, just use the variables you had as you started your question. – r2evans Apr 28 '21 at 14:07
  • That still doesn't work and I now have example data so I will add that to the question above. Thanks for the help by the way! – Luckystrikerr Apr 28 '21 at 14:11
  • Could you look at this code: ```set.seed(42) n <- 6 dat <- data.frame(id=1:n, date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"), group=rep(LETTERS[1:2], n/2), age=sample(18:30, n, replace=TRUE), type=factor(paste("type", 1:n)), x=rnorm(n)) my_type <- "type 1" filtered_dat <- dat %>% group_by(id, type) %>% filter(type == "my_type")``` Because now I don't use ```&``` or ```&&``` at all and it still doesn't work. – Luckystrikerr Apr 28 '21 at 14:18
  • Why are you trying to quote a variable? Just do `... %>% filter(type == my_type)`. – r2evans Apr 28 '21 at 14:21
  • Thanks, that works but it still doesn't work for my dataset, maybe because there are many rows with the same ID and DNA name? – Luckystrikerr Apr 28 '21 at 14:50
  • Please be clear: you are no longer quoting your variables (`type == my_type`), ***and*** you have switched from using `&&` to using `&`, and still your filtering is broken? If that's the case, then *please* (again) upload representative sample data that does what you want. As an example, this works with your usable sample data: `dat %>% group_by(group) %>% filter(any(type == my_type & age < 20))` (where `20` is a literal, but could easily be replaced with `my_age <- 20` and then use `filter(any(type == my_type & age < my_age))` – r2evans Apr 28 '21 at 14:54
  • 1
    It worked and I used ```&```. The mistake was using ```name``` while that was already a column name. Thanks for teaching me more about vectors and operators though! :) – Luckystrikerr Apr 28 '21 at 15:08
  • Ah, yes. (Not to beat a dead horse, but ... if you had provided a *representative* sample of the original `df1` with that as a column name *and* a variable outside of the frame, that would have been a fast and clear distinction.) – r2evans Apr 28 '21 at 15:10