1

on a SQLite database with weather informations (colums: DATE, DOY, TEMPERATURE) I like to add a new column with temperature sums (the sum of daily TEMPERATURE from January 1 to the current day). (DOY is the day of the year.)

The dplyr windows functions should be suitable to solve this problem. http://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html. Perhaps something like

mutate(TEMPERATURESUM = lag(TEMPERATURE, DOY, order_by=DATE))

but it gives an error in statement: no such function: LAG

Do you know an example from which I see how I'm doing it correct? I do not like to use pure SQL where there are so comfortable ways with dplyr.

thx Christof

ckluss
  • 1,477
  • 4
  • 21
  • 33
  • 1
    A simple test with `mtcars %>% mutate(lagged = lag(mpg, 1, order_by=carb))` seems to work. What type of object are you piping to `mutate`? What dplyr version are you using? – MrFlick Dec 18 '14 at 06:04
  • Could you please provide a reproducible example, including the dataset? In any case, from the dataset it sounds like you are misspelling `lag` as `LAG` – David Robinson Dec 18 '14 at 06:05
  • Are you looking for cumsum of temperature from jan 1 to current day perhaps? – talat Dec 18 '14 at 06:06
  • @MrFlick the object is a SQLite Database, @David `lag` is converted to LAG in SQL (SQL is case insensitivity) and LAG does not exist in SQL :( @docendo thx for the tipp, but unfortunately I get the same error with `cumsum`because it seems not to be translated in an SQL equivalent – ckluss Dec 18 '14 at 06:14
  • Are you opposed to doing this with your own SQL code? There are a few SO answers on cumulative sums with sqlite, seems to be pretty straightforward, e.g. [this recent one](http://stackoverflow.com/questions/21382766/cumulative-summing-values-in-sqlite). – andybega Dec 18 '14 at 08:47
  • SQLite does not support window functions, so you can't use `lag()` etc with it – hadley Dec 18 '14 at 18:58

1 Answers1

1

If you are not opposed to using raw SQL code, you could do this using library(rsqlite):

# SQL statements we want to run
sql1 <- "
ALTER TABLE t
ADD COLUMN tempsum real;"

sql2 <- "
UPDATE t
SET tempsum = (SELECT Sum(t2.temperature) 
               FROM t as t2 WHERE t2.doy <= t.doy);" 

dbSendQuery(conn, sql1)
dbSendQuery(conn, sql2)

This is adapted from this SO question about cumulative sums in SQLite, and an example showing it works on SQL Fiddle (which I learned about here).

But then I guess you could also just calculate the cumulative sum in R and write the results back to SQLite, if the table is not too big, so I'm not sure if your question was specifically about how to do this with dplyr, or how to do it in general.

Community
  • 1
  • 1
andybega
  • 1,387
  • 12
  • 19
  • thank you very much for the example. I will probably do it that way. Maybe it is not currently possible with `dplyr`in a simpler way _on the fly_ – ckluss Dec 18 '14 at 11:58