2

I have a data frame in a database with a date dimension that I pulled using RPostgres/RpostgreSQL and dbplyr. Each date is in the format 'YYYY-MM-DD', and I need to add a new date (or character) dimension that reflects the original date in the year-quarter format "YYYY-Q" (with a dash, not a period).

Given that I can't use lubridate with dbplyr, how would I accomplish this?

Thanks for your help! Here's a simplified version of what I have so far so you can see what packages I'm using and how I'm connecting to the database.

# Packages
library(RPostgres)
library(RPostgreSQL)
library(dplyr)
library(dbplyr)

# Connect to db 
drv <- dbDriver("Postgres")

# Setup connect to db
conn <- dbConnect(drv,
                  dbname = etc,)

# Define table to use in db
table <- tbl(conn, in_schema("xyz", "abc"))

#Select columns and filter
df <- table %>%
  #Filter for pertinent data
  filter(date > as.Date("2018-01-01") & date <= as.Date("2020-01-01")) 

 
  • FYI, from your [since-deleted question](https://stackoverflow.com/q/66958248/3358272), you mentioned you needed `RPostgreSQL` in order to get `dbConnect`. Not required. You can get that function in both of the [`RPostgres`](https://github.com/r-dbi/RPostgres/blob/main/NAMESPACE#L22) and [`DBI`](https://github.com/r-dbi/DBI/blob/main/NAMESPACE#L20) packages. – r2evans Apr 06 '21 at 00:03

3 Answers3

2

A bunch of lubridate functions are given SQL translations in dbplyr. So the code below works for me.

An advantage of doing this way is that if used with care, you can shift the location of data to the most efficient location (on the server or locally).

library(dplyr, warn.conflicts = FALSE)
library(DBI)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

pg <- dbConnect(RPostgres::Postgres(), bigint = "integer")
calls <- tbl(pg, sql("SELECT * FROM streetevents.calls"))

calls %>%
    mutate(yq = paste0(year(start_date), "-", quarter(start_date))) %>%
    select(start_date, yq)
#> # Source:   lazy query [?? x 2]
#> # Database: postgres [iangow@/tmp:5432/crsp]
#>    start_date          yq    
#>    <dttm>              <chr> 
#>  1 2013-09-10 19:30:00 2013-3
#>  2 2003-10-22 15:00:00 2003-4
#>  3 2009-10-22 16:00:00 2009-4
#>  4 2017-02-09 06:00:00 2017-1
#>  5 2010-02-22 22:00:00 2010-1
#>  6 2016-08-08 20:30:00 2016-3
#>  7 2016-05-11 13:00:00 2016-2
#>  8 2012-05-15 16:20:00 2012-2
#>  9 2004-08-19 21:00:00 2004-3
#> 10 2017-07-06 13:30:00 2017-3
#> # … with more rows

calls %>%
    collect(n = 10) %>%
    mutate(yq = paste0(year(start_date), "-", quarter(start_date))) %>%
    select(start_date, yq)
#> # A tibble: 10 x 2
#>    start_date          yq    
#>    <dttm>              <chr> 
#>  1 2013-09-10 19:30:00 2013-3
#>  2 2003-10-22 15:00:00 2003-4
#>  3 2009-10-22 16:00:00 2009-4
#>  4 2017-02-09 06:00:00 2017-1
#>  5 2010-02-22 22:00:00 2010-1
#>  6 2016-08-08 20:30:00 2016-3
#>  7 2016-05-11 13:00:00 2016-2
#>  8 2012-05-15 16:20:00 2012-2
#>  9 2004-08-19 21:00:00 2004-3
#> 10 2017-07-06 13:30:00 2017-3

Created on 2021-04-03 by the reprex package (v1.0.0)

Ian Gow
  • 3,098
  • 1
  • 25
  • 31
  • I'm unable to write to our Redshfit servers for some inexplicable reason, so bringing the data locally then putting it back into the server isn't an option. I'm not sure why I can't write to our servers when a co-worker of mine is able to do so with no problem... – NoCaresBear Apr 05 '21 at 13:19
  • The code above does not write back to the database. Sometimes it can be more efficient to process data on the server and then bring the data into R. You may not be able to write to the database because of permissions. For some use cases, the answer I give here (to my own question) may be helpful: https://stackoverflow.com/questions/66910165/how-do-i-get-local-data-into-read-only-database-using-dbplyr – Ian Gow Apr 05 '21 at 23:32
1

Can you use zoo's as.yearqtr ?

zoo::as.yearqtr(Sys.Date())
#[1] "2021 Q2"

to get data in specific format you can use format :

format(zoo::as.yearqtr(Sys.Date()), '%Y-%q')
#[1] "2021-2"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I'm having trouble converting that column of dates using as.yearqtr. How would I apply that to the dates in my remote table "df"? I'm new to R. Thanks for your patience. – NoCaresBear Apr 02 '21 at 14:35
  • What happens when you do `df <- df %>% mutate(new_date = zoo::as.yearqtr(date))` – Ronak Shah Apr 02 '21 at 23:56
1

Because dbplyr does not translate lubridate, the approach I often use for date manipulation is small fragments of SQL. You can see an example of this here.

As you need to extract year and quarter from an existing date, first step is to identify postgresql code fragments that do this. Note that different versions of SQL use different functions when working with dates, so this code will be postgresql specific.

Based on this link and this link, there appear to be two ways of pulling the components of a date out:

  1. EXTRACT(YEAR FROM date_column)
  2. DATE_PART('year', date_column)

I will use the first approach below.

As I am using SQL fragments, I will also do concatenation using SQL as per this link. For example: CONCAT(year_column '-', quarter_column). So my output will be a column of type text.

Combining these together gives:

library(dplyr)
library(dbplyr)

df = data.frame(my_num = c(1,2,3), my_dates = c('2000-01-01','2000-02-02','2000-03-03'))

df = tbl_lazy(df, con = simulate_postgres()) # simulated remote table

output = df %>%
  mutate(the_quarter = sql("EXTRACT(QUARTER FROM my_dates)"),
         the_year = sql("EXTRACT(YEAR FROM my_dates)")) %>%
  mutate(quarter = CONCAT(the_year, '-', the_quarter))

Calling show_query(output) allows us to check the generated postgresql query:

SELECT `my_num`
    , `my_dates`
    , `the_quarter`
    , `the_year`
    , CONCAT(`the_year`, '-', `the_quarter`) AS `quarter`
FROM (
    SELECT `my_num`
        , `my_dates`
        , EXTRACT(QUARTER FROM my_dates) AS `the_quarter`
        , EXTRACT(YEAR FROM my_dates) AS `the_year`
    FROM `df`
) `q01

But not as nicely formatted. As best I can tell, this is a valid postgresql function, so we can expect it to work.

Depending on your application, you might also want to consider this question, and in particular this answer. As there may be better alternatives to the 'YYYY-Q' format.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41
  • Thanks for responding. Running the first mutate function successfully extracted the quarter (I assume you meant that and not month) and year from the dates in integer format. However, the second mutate with CONCAT did not work. I get the following error: "Error: Failed to prepare query: ERROR: function concat_ws("unknown", integer, integer) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts." – NoCaresBear Apr 02 '21 at 14:33
  • Two options: (1) wrap `CONCAT(...)` inside `sql(.)` as we did for the `EXTRACT` code. (2) as `CONCAT_WS` appears to require text, not numeric input, you may have to do something like `mutate(the_year = as.character(the_year))` before you can concatenate it. – Simon.S.A. Apr 02 '21 at 22:15
  • You are correct, I meant quarter, but typed month. Answer edited to correct this. – Simon.S.A. Apr 02 '21 at 22:16
  • CONCAT can successfully concatenate the_year and the_quarter, but I can't seem to bring a hyphen into the mix. Here's that code: df %>% mutate(the_quarter = sql("EXTRACT(QUARTER FROM date)"), the_year = sql("EXTRACT(YEAR FROM date)")) %>% mutate(the_quarter = as.character(the_quarter), the_year = as.character(the_year)) %>% mutate(xyz = sql("CONCAT(the_year, the_quarter)")) %>% – NoCaresBear Apr 02 '21 at 22:23
  • This CONCAT_WS concatenate code doesn't want to work... `mutate(xyz = sql("CONCAT_WS('-', the_year, the_quarter)")`. When I run that I get a similar error: `Error: Failed to prepare query: ERROR: function concat_ws("unknown", character varying, character varying) does not exist HINT: No function matches the given name and argument types. You may need to add explicit type casts.` – NoCaresBear Apr 02 '21 at 22:26
  • Perhaps try `paste0` instead of `CONCAT`? – Simon.S.A. Apr 04 '21 at 07:01
  • Alternatively, your question inspired me to write a pure R approach see this answer: https://stackoverflow.com/a/66938792/7742981 – Simon.S.A. Apr 04 '21 at 07:04