-2

I have a dataset that consists of two Date columns. Both of the date columns are in reference to different events happening. I want to generate a dummy variable that equals 1 if both of the dates and time match. For example, if one date is (YY/MM/DD Format) 17/05/01 19:30:00 and another is 17/05/01 19:30:00 in the other date column then the dummy variable equals 1.

I need the dummy variable function to check throughout the entire column of dates and not just against one cell in the date column. I am relatively new to R and I am not explaining this well but say in the first date column the date and time are 19/06/04 14:23:00. I would like the function to check the entire other date column for the exact same date and timestamp and generate the dummy variable off of this.

Any help is appreciated!

  • [See here](https://stackoverflow.com/q/5963269/5325862) on making a reproducible example that is easier for folks to help with. There's no clear example here for anyone to use to help you – camille Jan 17 '22 at 20:16
  • Please provide enough code so others can better understand or reproduce the problem. – Community Jan 27 '22 at 07:51

1 Answers1

0

From your description, I think your columns are datetime and not just dates, correct? It’s an important distinction to make because dates and datetimes are separate classes in R. It’s also helpful if you provide a sample of your data so people can best understand and help. In the future you can use something like dput(head(df, 10)) and copy and paste the output to your question. For this question, I believe what you want should be quite straightforward.

Here we create some data for an example:

x <- c("22/01/01 08:00:00", "22/01/02 08:00:00", "22/01/03 08:00:00", "22/01/01 09:00:00")
x <- as.POSIXct(x, format = c("%y/%m/%d %H:%M:%S"))
y <- c("22/01/01 09:00:00", "22/01/01 08:00:00", "22/01/01 09:00:00", "22/01/02 10:00:00")
y <- as.POSIXct(y, format = c("%y/%m/%d %H:%M:%S"))

df <- data.frame(x, y)

If your datetimes are in the format you listed in your question, I’m going to go out on a limb and say they are characters and probably need to be formatted in POSIXct format as I’ve done above. You can use the date portion, but you may need to change the time depending on the format. I think this part may take longer than creating the indicator variable itself.

The strategy is just to create a reference vector of the datetime you want to check against, then see if your datetime of interest is in there.

date_check <- df$y
df$x_in_y <- ifelse(df$x %in% date_check, "yes", "no")

Here’s what it looks like. You’ll notice if it’s not the exact same time they are not considered equal.

df
#>                     x                   y x_in_y
#> 1 2022-01-01 08:00:00 2022-01-01 09:00:00    yes
#> 2 2022-01-02 08:00:00 2022-01-01 08:00:00     no
#> 3 2022-01-03 08:00:00 2022-01-01 09:00:00     no
#> 4 2022-01-01 09:00:00 2022-01-02 10:00:00    yes

Created on 2022-01-17 by the reprex package (v2.0.1)

TrainingPizza
  • 1,090
  • 3
  • 12