Questions tagged [sqldf]

SQLDF is an R package for running SQL statements on R data frames.

SQLDF is an R package for running SQL statements on R data frames.

The user simply specifies an SQL statement in R using data frame names in place of table names and a database with appropriate table layouts/schema is automatically created, the data frames are automatically loaded into the database, the specified SQL statement is executed, the result is read back into R and the database is deleted all automatically behind the scenes.

sqldf supports:

  • the SQLite backend database (by default),
  • the H2 java database,
  • the PostgreSQL database and
  • sqldf 0.4-0 onwards also supports MySQL.

Resources:

684 questions
119
votes
8 answers

Select the first row by group

From a dataframe like this test <- data.frame('id'= rep(1:5,2), 'string'= LETTERS[1:10]) test <- test[order(test$id), ] rownames(test) <- 1:10 > test id string 1 1 A 2 1 F 3 2 B 4 2 G 5 3 C 6 3 H …
dmvianna
  • 15,088
  • 18
  • 77
  • 106
74
votes
4 answers

Summarize all group values and a conditional subset in the same call

I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo …
kevinykuo
  • 4,600
  • 5
  • 23
  • 31
23
votes
3 answers

Best use of R and SQL if restricted to a local machine

I’m trying to improve my workflow, and am hoping the community can provide insights since I am relatively new to “big data”. I typically download several dataframes from public sources which can be related to one another. After pulling several…
Jayden.Cameron
  • 499
  • 4
  • 17
22
votes
4 answers

R: how to rbind two huge data-frames without running out of memory

I have two data-frames df1 and df2 that each have around 10 million rows and 4 columns. I read them into R using RODBC/sqlQuery with no problems, but when I try to rbind them, I get that most dreaded of R error messages: cannot allocate memory.…
Prasad Chalasani
  • 19,912
  • 7
  • 51
  • 73
15
votes
2 answers

Using sqldf and RPostgreSQL together

When using RPostgreSQL I find that I cannot use sqldf in the same way. For example if I load the library and read in data into a data frame using the following code: library(RPostgreSQL) drv <- dbDriver("PostgreSQL") con <- dbConnect(drv,…
djq
  • 14,810
  • 45
  • 122
  • 157
13
votes
2 answers

SQL-like functionality in R

I am used to writing data manipulation logic in SQL and now that I am learning R I find myself sometimes just wanting to do something that would be simple in SQL but I have to learn a bunch of stuff with R to do the same manipulation on an R data…
medriscoll
  • 26,995
  • 17
  • 40
  • 36
11
votes
3 answers

How can I pass R variable into sqldf?

I have some query like this: sqldf("select TenScore from data where State_P = 'AndhraPradesh'") But I have "AndhraPradesh" in a variable stateValue. How can I use this variable in a select query in R to get the same result as above. Please show me…
Sandeep
  • 663
  • 2
  • 8
  • 18
11
votes
4 answers

Error: Cannot pass NA to dbQuoteIdentifier() in sqldf package in R

Error: Cannot pass NA to dbQuoteIdentifier() In addition: Warning message: In field_types[] <- field_types[names(data)] : number of items to replace is not a multiple of replacement length This is the error message i am getting upon trying to…
10
votes
2 answers

Failed to connect the database when using sqldf in r

I loaded a csv file to my R, and when I Tried to use sqldf to select some column, it always went to Error in .local(drv, ...) : Failed to connect to database: Error: Access denied for user 'User'@'localhost' (using password: NO) Error in…
victordongy
  • 325
  • 2
  • 4
  • 13
9
votes
1 answer

sqldf: Changes timestamp from localtime to GMT/UTC

On my machine, sqldf changes a POSIXct from localtime(I'm at CST, GMT -0600) to GMT/UTC. Is this expected behavior? Can I stop R or sqldf from doing this? Here is my code: > library('sqldf') > > before <- data.frame(ct_sys = Sys.time()) >…
mpettis
  • 3,222
  • 4
  • 28
  • 35
8
votes
1 answer

Any way to automatically correct all variable classes in a dataframe

I have a dataframe with about ~250 variables. Unfortunately, all of these variables were imported as character classes from a sql database using sqldf. The problem: all of them should not be character classes. There are numeric variables, integers,…
jgozal
  • 1,480
  • 6
  • 22
  • 43
8
votes
1 answer

How can I keep a date formatted in R using sqldf?

How do I rename a date field in SQLDF without changing the format? See my example below where my renamed date field "dt" converts the date to a number. How do I avoid this, or convert it back to a date? #Question for Stack Exchange df <- data.frame…
Chris L
  • 338
  • 1
  • 4
  • 15
8
votes
1 answer

How to append several large data.table objects into a single data.table and export to csv quickly without running out of memory?

The easy answer to this is "buy more RAM" but I am hoping to get a more constructive answer and learn something in the process. I am running Windows 7 64-bit with 8GB of RAM. I have several very large .csv.gz files (~450MB uncompressed) with the…
Brian D
  • 2,570
  • 1
  • 24
  • 43
7
votes
2 answers

Built Family nested tree parent / children relationship in R

I am working on families trees : I have adapted Bob Horton's example based on sqldf https://www.r-bloggers.com/exploring-recursive-ctes-with-sqldf/ My data : person father Guillou Arthur NA Cleach Marc …
Wilcar
  • 2,349
  • 2
  • 21
  • 48
7
votes
4 answers

Find nearest matches for each row and sum based on a condition

Consider the following data.table of events: library(data.table) breaks <- data.table(id = 1:8, Channel = c("NP1", "NP1", "NP2", "NP2", "NP3", "NP3", "AT4", "AT4"), Time = c(1000, 1100, 975, 1075, 1010,…
Dahbid
  • 73
  • 1
  • 5
1
2 3
45 46