6
arm<-as.data.frame(matrix(c(1,1,1,2,2,6,7,4,9,10),ncol=2))

colnames(arm)<-c("a","b")

This is a dataset I created in R.

Now I want to rank the column b and group by column a.

The following piece of code is throwing this error, no matter what changes I make to the syntax(like adding [], "", etc...)

Error in sqliteSendQuery(con, statement, bind.data) : error in statement: near "(": syntax error

I was using "sqldf" package.

arm2<-sqldf("select a,
         b,
         rank() over (partition by a order by b) as rank1 
         from arm")

Then I installed the RH2 package and it started to throw the following error:

Error in .verify.JDBC.result(s, "Unable to execute JDBC statement ", statement) : Unable to execute JDBC statement select a, b, rank() over (partition by a order by b) as rank1 from arm (Function "rank" not found; SQL statement: select a, b, rank() over (partition by a order by b) as rank1 from arm [90022-175])

How to use rank() over function of sql in sqldf package of R?

peterh
  • 11,875
  • 18
  • 85
  • 108
Bill Stoyanov
  • 79
  • 1
  • 3
  • 4
    Downvote seems a bit harsh. The question outlines the package used, the data used, the code used, and the error messages. Considering some of the crap that gets upvoted here, it shows effort to explain the issue. – thelatemail Sep 02 '15 at 22:50
  • 5
    rsqldf uses sqlite by default, which doesn't have a rank() function. I'm not familiar with RH2, but it seems like ti has the same problem. You can try postgresql following the examples [from here](https://github.com/ggrothendieck/sqldf#12-how-does-one-use-sqldf-with-postgresql) – jeremycg Sep 02 '15 at 23:04
  • I don't understand this question. Is this a bug report? If so, please report on [GH](https://github.com/ggrothendieck/sqldf/issues). Or are you looking for a general solution? In that case, this error should be only relevant in order to show your initial effort, and thus you can use some other alternatives such as `arm$rank1 <- unlist(with(arm, tapply(b, a, rank)))`, for example. – David Arenburg Sep 02 '15 at 23:47
  • @DavidArenburg - `unlist(tapply(...))` would be better replaced by `ave`, so it is isn't dependent on the order of the dataset. – thelatemail Sep 03 '15 at 00:42
  • @thelatemail yeah, you have a point there – David Arenburg Sep 03 '15 at 00:48
  • Hi David, I do know how to do with other methods. But the thing I wanted here was "how to do it with sql". Thanks a lot jeremycyg and thelatemail. @DavidArenburg – Bill Stoyanov Sep 03 '15 at 03:26
  • Thanks a lot @jeremycg – Bill Stoyanov Sep 03 '15 at 03:28

1 Answers1

3

sqldf uses SQLite which does not support the rank() function - see here. From the error message you got from H2, it does not either, though it is currently planned.

sqldf has capability to use PostgreSQL rather than SQLite, which does support rank(): see here for an example. Your code as posted should then work.

If you don;t want to use PostgreSQL, you can get the data out in the right order with SQLite and sqldf using:

sqldf("select a, b from arm 
          order by a, b", drv = "SQLite")

but the ranking column is more difficult - see some related answers: 1, 2, 3

Since you are already in R, you could use dplyr, a native R package:

library(dplyr)
arm %>% group_by(a) %>%
        mutate(rank = rank(b))

Or data.table, a faster alternative:

library(data.table)
setDT(arm)[ , rank := rank(b), by = a]
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • 2
    Note that if you really want to use SQLite then: `a1 <- sqldf("select a, b from arm order by a, b"); min_rowid <- sqldf("select x.*, min(y.rowid) min_rowid from a1 x left join a1 y using(a) group by x.a"); sqldf("select a1.*, a1.rowid - min_rowid + 1 rank from a1 join min_rowid using(a)")` – G. Grothendieck Sep 03 '15 at 15:32