5

I am new to RSQLite. I have an input document in text format in which values are seperately by '|' I created a table with the required variables (dummy code as follows)

db<-dbconnect(SQLite(),dbname="test.sqlite")

dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)

However I am struck at how to import values into the created table. I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.

Can someone suggest an alternative efficient way to do so?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Sanju
  • 169
  • 2
  • 8

3 Answers3

5

You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.

You have two routes:

1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!

To load data into the table, load your text file into R using e.g. df <- read.table('textfile.txt', sep='|') (modify arguments to fit your text file).

To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:

dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))

You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!

Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.

I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:

dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)

This will execute an INSERT statement for each row in df!

TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:

dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)

and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!


2: Ah, yes. The second way. This can be done in SQLite entirely. With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.


Useful site to remember: http://www.sqlite.com/lang.html

MrGumble
  • 5,631
  • 1
  • 18
  • 33
  • This doesn't answer the question. – Tim Biegeleisen Sep 20 '17 at 08:48
  • 1
    How does this not answer the question? – MrGumble Sep 20 '17 at 08:51
  • 1
    `Can someone suggest an alternative efficient way to do so?` ... in the database world generating an INSERT script with thousands of statements, or more, would not be considered efficient. On any major RDBMS your answer would not be the preferred way to go, either from a performance or from an accuracy point of view. – Tim Biegeleisen Sep 20 '17 at 08:55
  • 1
    And my answer is an efficient way of doing so within the `RSQLite` framework that he is asking. – MrGumble Sep 20 '17 at 08:57
2

A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.

library(DBI)

db<-dbConnect(RSQLite::SQLite(),dbname=":memory")

dbExecute(db,
          "CREATE TABLE TABLE1(
             MARKS INTEGER,
             ROLLNUM INTEGER,
             NAME TEXT
           )"
)

df <- data.frame(MARKS = sample(1:100, 10), 
                 ROLLNUM = sample(1:100, 10), 
                 NAME = stringi::stri_rand_strings(10, 10))

dbAppendTable(db, "TABLE1", df)
blongworth
  • 337
  • 2
  • 10
  • The question is about an 'input document in text format', not a data.frame. Reading that big a file in R isn't efficient. – Salix Nov 22 '21 at 04:18
  • I do not understand your comment. There is nothing big about "thousands of rows and more than 20+ columns". `dbAppendTable()` takes a data.frame as input, therefore, the OP could easily read their text file into a data.frame and use `dbAppendTable()` to insert their rows in a single transaction. This is much more efficient than the accepted answer's approach. – blongworth Nov 26 '21 at 23:00
  • It can actually take a while for R to read that into memory. I don't see an accepted answer, so not sure which approach you mean. – Salix Nov 27 '21 at 21:06
  • the related [post](https://stackoverflow.com/questions/25194568/how-to-import-tab-delimited-data-to-sqlite-using-rsqlite) commented on the question has the most efficient way – Salix Nov 27 '21 at 21:16
  • I was referring to @MrGrumble's answer. Using `dbWriteTable()` as in the post you mention works if OP is creating a new table, but not if they are appending data to an existing table. I benchmarked reading from file and using `dbAppendTable()` [here](https://gist.github.com/blongworth/e8b3316fd42989be4793c533df5a81ba). 134 us for 100k rows. – blongworth Nov 28 '21 at 13:18
  • OP was using 'CREATE TABLE' so I'm guessing it's a new one. They also mention 20+ columns. If you want real examples of heavy files, I'd try benchmarking with the taxdump files of NCBI (ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz). The names.dmp and nodes.dmp are bareable being under 1GB but writeTable() is still better. The 'nucl_gb.accession2taxid' file on the other hand? being 5.93GB? it's a lot...) – Salix Nov 30 '21 at 17:51
0

I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.

From the command line you may try the following:

.separator |
.import your_file.csv your_table

where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.

See the documentation under CSV Import for more information.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • `RSQLite` does support inserting (or selecting!) via e.g. a data frame or vector, see my answer. And thanks for providing your answer, as that was my 2nd approach, but I got to lazy to write in full. – MrGumble Sep 20 '17 at 08:47
  • 1
    @MrGumble No, it doesn't appear to support bulk insert and the whole point of the OP's question was to avoid a messy and inefficient script to generate insert statements. – Tim Biegeleisen Sep 20 '17 at 08:48
  • What do you mean; what part of my answer shows that `RSQLite` does not support bulk insertions? It literally *does* support bulk insertions, see my answer. `RSQLite` supports both named and unnamed placeholders in both INSERT and SELECT statements, allowing the user to supply vectors of values to insert into the placeholders. – MrGumble Sep 20 '17 at 08:50
  • @TimBiegeleisen: Check the new [`dbBind()`](https://cran.r-project.org/web/packages/DBI/vignettes/spec.html#_bind_values_to_a_parameterizedprepared_statement_), especially the examples therein. Should be fast enough for many applications. – krlmlr Sep 21 '17 at 06:29