0

I'm new to dplyr, but I've been searching for hours for an answer to this without much success. I am currently trying to write a function that will return a 0 or 1 depending on whether a time falls within a certain range, but only if it matches both the Date and City that is relevant. Here is the newest iteration of code I've come up with:

NEW_DF <- df1 %>% 
  full_join(df2, by="City", keep= TRUE) %>% 
  mutate(newvariable = case_when(
    df1$City == df2$City & df1$Dates == df2$Dates & df1$Start <= df2$Time <= df1$End ~ 1,
    df1$City == df2$City & df1$Dates == df2$Dates & df1$Start > df2$Time ~ 0,
    df1$City == df2$City & df1$Dates == df2$Dates & df1$End > df2$Time ~ 0,
    TRUE ~ NA_real_)) %>% 
  select(City=df2$City, Date=df2$Dates, Time=df2$Time, newvariable) %>% 
  semi_join(df2, by="City")

Ideally, this would result in a table where if given matching City and Dates, it would see if the time in df2 fell inside or outside of the Start/End range in df1. But I keep getting errors - the error for my newest code is this:

Error in select(City=df2$City, Date=df2$Dates, Time=df2$Time, newvariable), : object 'newvariable' not found

With different code, I got this error, which I'm only including to be thorough:

Error in UseMethod("select") : no applicable method for 'select' applied to an object of class "character"

I thought I might need to build a new vector for my new variable to populate, but that doesn't seem to work.

Jedi93
  • 37
  • 5
  • Hello, Thanks for including some code, but this is still very difficult to work with. Please consider writing a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so that we can reproduce the error in our own session. – Justin Landis Aug 26 '21 at 18:08

1 Answers1

2

It's hard to be sure what the issue without access to your data. When you are having issues, it's best to make a reproducible example that anyone can run to see the behavior. See https://reprex.tidyverse.org/

But from the look of it, I think the issue is that you are trying to use $ inside of the dplyr functions. That's not correct. Just refer to the columns of the data frame by name:

NEW_DF <- df1 %>% 
  full_join(df2, by="City") %>% # You don't want keep = TRUE usually
  mutate(newvariable = case_when(
    # the City columns always match because that's how the two data frames were joined
    # with full_join(), when there are duplicate column names across the two data frames other than the columns included in `by`, the suffixes `.x` and `.y` are added by default
    # finally, note that you can't string together multiple <= comparisions like you did above
    Dates.x == Dates.y & Start <= Time & Time <= End ~ 1,
    Dates.x == Dates.y & Start > Time ~ 0,
    Dates.x == Dates.y & End > Time ~ 0,
    TRUE ~ NA_real_)) %>% 
  select(City, Date = Dates.y, Time, newvariable) %>% 
  semi_join(df2, by="City")
Brenton M. Wiernik
  • 1,006
  • 4
  • 8
  • 1
    That was the exact issue I was having, that makes perfect sense. Your response also helped me understand the way dplyr reads data much better, I seriously appreciate it. The only change I made was with the last line of the conditions to `Dates.x == Dates.y & End < Time ~ 0`, but that was an easy fix. – Jedi93 Aug 26 '21 at 20:21