0

I'm using SQLite with R (the package RSQLite). The following after importing rSQLite, I'm trying to pipe 500 GB of data into an SQL table with the standard following command:

CREATE TABLE name (
 field1 INTEGER,
 field2 VARCHAR,
 field3 INTEGER,
 field4 INTEGER,
 field5 VARCHAR,
 ); 

However, this takes approximately 3 weeks to complete. How does one speed this up? Are other options commonly used?

I am trying to perform around 100 queries under one second on 20 billion rows, 5 columns. I am not using multiple commodity machines, but rather a single HPC server with 200 threads and 1 TB RAM. What would be the correct approach?

Community
  • 1
  • 1
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • Why not load all the data in memory? It's not clear what is the nature of the queries, but you can create some in memory binary search trees to perform them. – Wagner DosAnjos Jan 06 '17 at 15:14
  • @wdosanjos I'm simply doing OR read queries. " you can create some in memory binary search trees to perform them" What do you mean? – ShanZhengYang Jan 06 '17 at 15:18
  • I think SQLite is not the right approach for your problem. Have you looked into [Apache Solr](http://lucene.apache.org/solr/) or [ElasticSearch](https://www.elastic.co/)? – Wagner DosAnjos Jan 06 '17 at 15:19
  • From Wikipedia: [Search Tree](https://en.wikipedia.org/wiki/Search_tree) – Wagner DosAnjos Jan 06 '17 at 15:21
  • And how exactly are you piping the data? – CL. Jan 06 '17 at 16:20
  • @CL. I'm muning it it into csv format, and then creating the table as above – ShanZhengYang Jan 06 '17 at 16:42
  • 2
    A CREATE TABLE statement creates an *empty* table. How do you put the data into it? – CL. Jan 06 '17 at 16:43
  • @CL. I'm doing this in R, so I use this: https://cran.r-project.org/web/packages/RSQLite/RSQLite.pdf – ShanZhengYang Jan 06 '17 at 16:49
  • 2
    Can you please show the R code? – krlmlr Jan 07 '17 at 08:49
  • Are you using [`read.csv.sql`](https://stackoverflow.com/questions/4332976/how-to-import-csv-into-sqlite-using-rsqlite)? You may want to create the table with R and then close R and [use the sqlite3 command line shell to import the file](https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r/1820610#1820610). I would not create any indexes on your table when you're creating it. Wait until you've imported all your data to do that. – Bacon Bits Sep 18 '17 at 14:55
  • Also, you may find [this question](https://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r/1820610#1820610) generally useful given what you're attempting. – Bacon Bits Sep 18 '17 at 14:55
  • [Here](https://stackoverflow.com/questions/6627235/sqlite-import-tab-file-is-import-doing-one-insert-per-row-or-grouping-them-wit) is a decent question showing how to import CSVs into SQLite with the SQLite command shell. Note that the pragma settings, particularly synchronous, should only be set that way during the import. They should be re-tuned afterwards to better suit optimizing your reads. – Bacon Bits Sep 18 '17 at 15:01

0 Answers0