0

I have data from different dates in R which I need to remove if the data is not from timestamps between 08:00:00 and 18:00:00.

Is it easier to remove them if there is a column with the date and time together or should I seperate them first so that I have a column with date and another one with the time?

timestamp Price
2001-02-02 20:15:00 0.01
2001-02-02 21:17:00 0.05
2001-02-03 10:10:00 0.03

I have data for nearly every minute a day, between 2005 and 2020. Executing the task in this example I hope that only 2001-02-03 10:10:00 with the given return should be left.

Thanks for you help!

r2evans
  • 141,215
  • 6
  • 77
  • 149
Hawky
  • 49
  • 10
  • Also somewhat related if you want to "do things" with times in R: [Convert hour:minute:second (HH:MM:SS) string to proper time class](https://stackoverflow.com/questions/12034424/convert-hourminutesecond-hhmmss-string-to-proper-time-class) – Henrik Aug 19 '21 at 16:55

1 Answers1

0

While native R doesn't have a Time (no date) class, fortunately one can still do comparisons with %H:%M:%S and they'll compare correctly.

dat <- structure(list(timestamp = structure(c(981162900, 981166620, 981213000), class = c("POSIXct", "POSIXt"), tzone = ""), Price = c(0.01, 0.05, 0.03)), row.names = c(NA, -3L), class = "data.frame")

# assuming `timestamp` is `POSIXt` class
format(dat$timestamp, format = "%H:%M:%S")
# [1] "20:15:00" "21:17:00" "10:10:00"
class(format(dat$timestamp, format = "%H:%M:%S"))
# [1] "character"

between(format(dat$timestamp, format = "%H:%M:%S"), "08:00:00", "18:00:00")
# [1] FALSE FALSE  TRUE

This works well with dplyr::between and data.table::between. If you have or want neither, it still works with simple comparisons, and you can mimic those functions with your own:

mybetween <- function(x, a, b) x >= a & x <= b
mybetween(format(dat$timestamp, format = "%H:%M:%S"), "08:00:00", "18:00:00")
# [1] FALSE FALSE  TRUE
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks for the quick response. I don't fully understand the idea behind the structure. If I have a dataframe with like 1 million timestamps... it would be to much work to type these for every timestamps or am I wrong? Does the first line of "mybetween" writes the left data into the dataframe? – Hawky Aug 19 '21 at 16:53
  • `dat[between(...),]` will return those rows that are between the two times, regardless of whether it is 3 rows or 3M. – r2evans Aug 19 '21 at 17:43