Errors when using dplyr to perform conditional mutation

Question

I'm new to dplyr, but I've been searching for hours for an answer to this without much success. I am currently trying to write a function that will return a 0 or 1 depending on whether a time falls within a certain range, but only if it matches both the Date and City that is relevant. Here is the newest iteration of code I've come up with:

NEW_DF <- df1 %>% 
  full_join(df2, by="City", keep= TRUE) %>% 
  mutate(newvariable = case_when(
    df1$City == df2$City & df1$Dates == df2$Dates & df1$Start <= df2$Time <= df1$End ~ 1,
    df1$City == df2$City & df1$Dates == df2$Dates & df1$Start > df2$Time ~ 0,
    df1$City == df2$City & df1$Dates == df2$Dates & df1$End > df2$Time ~ 0,
    TRUE ~ NA_real_)) %>% 
  select(City=df2$City, Date=df2$Dates, Time=df2$Time, newvariable) %>% 
  semi_join(df2, by="City")

Ideally, this would result in a table where if given matching City and Dates, it would see if the time in df2 fell inside or outside of the Start/End range in df1. But I keep getting errors - the error for my newest code is this:

Error in select(City=df2$City, Date=df2$Dates, Time=df2$Time, newvariable), : object 'newvariable' not found

With different code, I got this error, which I'm only including to be thorough:

Error in UseMethod("select") : no applicable method for 'select' applied to an object of class "character"

I thought I might need to build a new vector for my new variable to populate, but that doesn't seem to work.

Hello, Thanks for including some code, but this is still very difficult to work with. Please consider writing a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so that we can reproduce the error in our own session. — Justin Landis, Aug 26 '21 at 18:08

score 2 · Accepted Answer · answered Aug 26 '21 at 18:08

It's hard to be sure what the issue without access to your data. When you are having issues, it's best to make a reproducible example that anyone can run to see the behavior. See https://reprex.tidyverse.org/

But from the look of it, I think the issue is that you are trying to use $ inside of the dplyr functions. That's not correct. Just refer to the columns of the data frame by name:

NEW_DF <- df1 %>% 
  full_join(df2, by="City") %>% # You don't want keep = TRUE usually
  mutate(newvariable = case_when(
    # the City columns always match because that's how the two data frames were joined
    # with full_join(), when there are duplicate column names across the two data frames other than the columns included in `by`, the suffixes `.x` and `.y` are added by default
    # finally, note that you can't string together multiple <= comparisions like you did above
    Dates.x == Dates.y & Start <= Time & Time <= End ~ 1,
    Dates.x == Dates.y & Start > Time ~ 0,
    Dates.x == Dates.y & End > Time ~ 0,
    TRUE ~ NA_real_)) %>% 
  select(City, Date = Dates.y, Time, newvariable) %>% 
  semi_join(df2, by="City")

That was the exact issue I was having, that makes perfect sense. Your response also helped me understand the way dplyr reads data much better, I seriously appreciate it. The only change I made was with the last line of the conditions to `Dates.x == Dates.y & End < Time ~ 0`, but that was an easy fix. — Jedi93, Aug 26 '21 at 20:21

Errors when using dplyr to perform conditional mutation

1 Answers1