0

I am working with R where I have a variable '2 month 3 day 6 hour 70 minute' as string. The variable changes over time and therefore does not have the same length/structure. I need this variable to do a query on a PostgreSQL database by casting it to an interval. This works just fine.

Now I need this interval/string-variable as integer in minutes to do some mathematical calculations.

I thought of using sqldf the following:

library(sqldf)
my_interval = '2 month 3 day 6 hour 70 minute'
interval_minutes <- sqldf(paste("SELECT EXTRACT(EPOCH FROM '",my_interval,"'::INTERVAL)/60"))
interval_minutes_novar <- sqldf("SELECT EXTRACT(EPOCH FROM '2 month 3 day 6 hour 70 minute'::INTERVAL)/60")

but am getting Error: near "FROM": syntax error. From my research I know that sqldf uses SQLite, which does not support EXTRACT().

How can I convert a SQL-Interval to minutes using R?

MrXsquared
  • 173
  • 4
  • 15
  • Does it have to be done in SQL? It might be simpler in R. Either way, you said that the format varies, can you include more variants? – r2evans Jan 14 '22 at 12:38
  • Also ... how can one convert "months" to a deterministic number of seconds? It's not a constant, varying between 2,419,200 and 2,678,400 seconds per month. – r2evans Jan 14 '22 at 12:42
  • @r2evans no, it does not have to be done using SQL. Actually the question is how to do it in R. SQL was only my attempt and workaround. Good point on how to dermine "month". As far as I know its always 30 days. – MrXsquared Jan 14 '22 at 12:49
  • In reality I can just type `64 days` or `63 days` instead of `2 month 3 day`, so thats a theoretical issue. – MrXsquared Jan 14 '22 at 12:56
  • @MrXsquared, or `62 days`, or `65 days`. Does the hours and minutes really matter when you don't even know the number of days? – jarlh Jan 14 '22 at 13:12

3 Answers3

2

1) sqldf/gsubfn Using gsubfn replace each word in my_interval with *, the appropriate number of minutes and + . Remove any trailing + and spaces and then either parse and evaluate mins or substitute mins into the sql statement. There are 365.25 / 12 days in the average month over 4 calendar years, having one leap year, but if you want to get the same answer as PostgreSQL replace 365.25 / 12 with 30, as noted in the comments.

library(sqldf)  # this also pulls in gsubfn

# input
my_interval = '2 month 3 day 6 hour 70 minute'

L <- list(minute = " +", hour = "*60 +", day = "*60*24 +", 
       month = "*365.25 * 60 * 24 /12 +")
mins <- my_interval |>
  gsubfn(pattern = "\\w+", replacement = L) |> 
  trimws(whitespace = "[+ ]")

eval(parse(text = mins))
## [1] 92410

fn$sqldf("select $mins mins")
##    mins
## 1 92410

2) Base R This is a base R solution. Extract the numbers and words into separate vectors, translate the words to the appropriate factors and take their inner product. The discussion about 30 days months in (1) applies here too.

v <- c(minute = 1, hour = 60, day = 60 * 24, month = 365.25 * 60 * 24 /12)
nums <- my_interval |>
  gsub(pattern = "[a-z]", replacement = "") |>
  textConnection() |>
  scan(quiet = TRUE)
words <- my_interval |>
  gsub(pattern = "\\d", replacement = "") |>
  textConnection() |>
  scan(what = "", quiet = TRUE)
sum(v[words] * nums)
## [1] 92410

3) lubridate lubridate duration objects can be used.

library(lubridate)
as.numeric(duration(my_interval), "minute")
## [1] 92410

Although lubridate does not handle 30 day months (and Hadley says it is not planned) we can preprocess my_interval to get the effect.

library(gsubfn)
library(lubridate)

my_interval |>
 gsubfn(pattern = "(\\d+) +month", replacement = ~paste(30*as.numeric(x),"day")) |>
 duration() |>
 as.numeric("minute")
## [1] 91150
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Ooh, I like this much better than mine. Nice! – r2evans Jan 14 '22 at 12:58
  • Neat solutions. In fact they return more accurate results than PostgreSQL (which uses 360 days per year and 30 days per month). For future readers I just want to add that you need to be aware about this fact: **The results differ**. For solutions 1) and 2) you can replace `365.25` with `360` to get the same [results as you would in PostgreSQL](https://stackoverflow.com/a/70710331/8947209). – MrXsquared Jan 14 '22 at 13:44
1

Adapting my answer here to this, I'll restate a rather gaping problem with this conversion: convert "month" into "seconds" is not constant, as months vary between 28-31 days. If we assume 30, though, for the sake of arguments, then:

func <- function(x, ptn) {
  out <- gsub(paste0(".*?\\b([0-9.]+)\\s*", ptn, ".*"), "\\1", x, ignore.case = TRUE)
  ifelse(out == x, NA, out)
}
res1 <- lapply(c(mon = "month", day = "day", hr = "hour", min = "minute"),
               function(ptn) as.numeric(func(my_interval, ptn)))
res2 <- lapply(res1, function(z) ifelse(is.na(z), 0, z))
res2
# $mon
# [1] 2
# $day
# [1] 3
# $hr
# [1] 6
# $min
# [1] 70
86400 * (res2$mon*31 + res2$day) + 3600*res2$mon + 60*res2$hr
# [1] 5623560

Because I'm using lapply and simple vectorizable operations here, this also works if my_interval is more than one string (of similar format). It is robust to missing variables (presumed 0), and can include "year" (albeit with leap-year inaccuracies) and/or "second" if desired.

intervals <- c("2 month 3 day 6 hour 70 minute", "1 year", "1 hour 1 second")
res1 <- lapply(c(yr = "year", mon = "month", day = "day", hr = "hour", min = "minute", sec = "second"),
               function(ptn) as.numeric(func(intervals, ptn)))
res2 <- lapply(res1, function(z) ifelse(is.na(z), 0, z))
str(res2)
# List of 6
#  $ yr : num [1:3] 0 1 0
#  $ mon: num [1:3] 2 0 0
#  $ day: num [1:3] 3 0 0
#  $ hr : num [1:3] 6 0 1
#  $ min: num [1:3] 70 0 0
#  $ sec: num [1:3] 0 0 1
86400 * (res2$yr*365 + res2$mon*31 + res2$day) + 3600*res2$mon + 60*res2$hr + res2$sec
# [1] 5.6e+06 3.2e+07 6.1e+01
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

My workaround is to use my PostgreSQL connection to do it:

library(sf)
library(RPostgres)

my_postgresql_connection <- dbConnect(Postgres(), dbname = "my_db", host = "my_host", port = 1234, user = "my_user", password = "my_password")
my_interval = '2 month 3 day 6 hour 70 minute'
my_dataframe <- st_read(my_postgresql_connection, query = paste("SELECT EXTRACT(EPOCH FROM '",my_interval,"'::INTERVAL)/60 as minutes"))
my_interval_in_minutes <- as.double(my_dataframe$minutes[1])
MrXsquared
  • 173
  • 4
  • 15