7

I want to take advantage of the speed benefits of holding an SQLite database (via SQLAlchemy) in memory while I go through a one-time process of inserting content, and then dump it to file, stored to be used later.

Consider a bog-standard database created in the usual way:

# in-memory database
e = create_engine('sqlite://')

Is there a quicker way of moving its contents to disc, other than just creating a brand new database and inserting each entry manually?

EDIT:

There is some doubt as to whether or not I'd even see any benefits to using an in-memory database. Unfortunately I already see a huge time difference of about 120x.

This confusion is probably due to me missing out some important detail in the question. Also probably due to a lack of understanding on my part re: caches / page sizes / etc. Allow me to elaborate:

I am running simulations of a system I have set up, with each simulation going through the following stages:

  1. Make some queries to the database.
  2. Make calculations / run a simulation based on the results of those queries.
  3. insert new entries into the database based on the most recent simulation.
  4. Make sure the database is up to date with the new entries by running commit().

While I only ever make a dozen or so insertions on each simulation run, I do however run millions of simulations, and the results of each simulation need to be available for future simulations to take place. As I say, this read and write process takes considerably longer when running a file-backed database; it's the difference between 6 hours and a month.

Hopefully this clarifies things. I can cobble together a simple python script to outline my process further a little further if necessary.

JimmidyJoo
  • 10,503
  • 7
  • 27
  • 30

1 Answers1

1

SQLAlchemy and SQLite know how to cache and do batch-inserts just fine.

There is no benefit in using an in-memory SQLite database here, because that database uses pages just like the on-disk version would, and the only difference is that eventually those pages get written to disk for disk-based database. The difference in performance is only 1.5 times, see SQLite Performance Benchmark -- why is :memory: so slow...only 1.5X as fast as disk?

There is also no way to move the in-memory database to a disk-based database at a later time, short of running queries on the in-memory database and executing batch inserts into the disk-based database on two separate connections.

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I have made an edit to the original question that addresses some of your points. – JimmidyJoo Apr 05 '13 at 14:21
  • You cannot create a data cache and keep the data in memory where helpful? – Martijn Pieters Apr 05 '13 at 14:25
  • Assuming I understand your question correctly, my answer is no. Admittedly, I don't use every piece of information I dump to the database, but I *do* however use a piece of information from every single previous simulation that has been run. My simulations are also about to get considerably more complex, using past data in more complicated ways. I had hoped to use SQL's various query commands to help me with some of this. – JimmidyJoo Apr 05 '13 at 14:37
  • Would you say that is sounds like your suggestion of executing batch inserts is my only option at this point? (Apologies for my slow response time by the way, my ISP is cruising for a bruising.) – JimmidyJoo Apr 05 '13 at 14:43
  • No need to apologize, this is a async medium. You cannot move an in-memory SQLite database to disk at a later time, so you'll have to find other means of speeding this all up. Using some form of caching (which *could* be a in-memory SQLite *in addition*) would be the direction I'd take. – Martijn Pieters Apr 05 '13 at 14:56
  • Okay, I'll have a think and see what I can come up with. I appreciate your input, many thanks. – JimmidyJoo Apr 05 '13 at 15:14
  • It is not very slow for local disk, but if the file is in nfs, it is twice slow. – balki Jun 04 '14 at 23:39