28

I have a reasonably large data set and would like to store it in a file rather than a RDBMS.

The main table in the data set is just over 1M rows, 30 columns and about 600Mb in size as a CSV.

I'm considering SQLite. Is SQLite worth investigating for data sets of this size?

Mark Nold
  • 5,638
  • 7
  • 31
  • 33

5 Answers5

26

SQLite will handle that file just fine; make sure to import the records in a transaction so that it doesn't spend a lot of time creating indexes until everything is imported.

Ana Betts
  • 73,868
  • 16
  • 141
  • 209
15

You already have your answer but I'd like to share my current experiment: I've dumped billions of records worth 793 GiB of data into a single SQLite database and read queries are still surprisingly fast (under 1m).

Creation time took little over 22 hours and the post-index creation takes about 4 hours per column.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
10

I investigated SQLite recently for a similar application. The SQLite documentation states that SQLite databases can be terabytes in size, and that the primary limitation of SQLite is concurrency (many users at the same time). Although we didn't go this direction (we have our own binary storage format), I was pretty confident that SQLite could handle files of this size.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
9

sqlite is fast when you use transactions and don't commit too often. Using prepared statements with parameters speeds things up too. Sqlite doesn't have to reparse each sql statement when you use parameterized queries. An example: How do I get around the "'" problem in sqlite and c#?

I store 2 gigabyte in an sqlite db, it works very well. A big advantage of sqlite above a flat file is the possibility to index your data.

Community
  • 1
  • 1
Theo
  • 819
  • 4
  • 6
5

sqlite should work just fine for you. I have run a data set of that size even in an embedded device and sqlite performance was quite reasonable.

As stated, the main bottleneck is concurrency. Aim to design your system so that there is at most one database handle open per database file.

laalto
  • 150,114
  • 66
  • 286
  • 303