2

As an answer on my question: Is it normal that sqlite.fetchall() is so slow? it seems that fetch-all and fetch-one can be incredibly slow for sqlite.

As I mentioned there, I have the following query:

time0 = time.time()
self.cursor.execute("SELECT spectrum_id, feature_table_id "+
                "FROM spectrum AS s "+
                "INNER JOIN feature AS f "+
                "ON f.msrun_msrun_id = s.msrun_msrun_id "+
                "INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax "+
                             "FROM convexhull GROUP BY feature_feature_table_id) AS t "+
                "ON t.feature_feature_table_id = f.feature_table_id "+
                "WHERE s.msrun_msrun_id = ? "+
                "AND s.scan_start_time >= t.rtMin "+
                "AND s.scan_start_time <= t.rtMax "+
                "AND base_peak_mz >= t.mzMin "+
                "AND base_peak_mz <= t.mzMax", spectrumFeature_InputValues)
print 'query took:',time.time()-time0,'seconds'

time0 = time.time()
spectrumAndFeature_ids = self.cursor.fetchall()      
print time.time()-time0,'seconds since to fetchall'

The execution of the select statement takes about 50 seconds (acceptable). However, the fetchall() takes 788 seconds, only fetching 981 results.

The way proposed to speed up the query given as answer to my question: Is it normal that sqlite.fetchall() is so slow? using fetchmany(), has not improved the speed of fetching the results.

How can I speed up fetching the results after running an sqlite query?


The sql exactly as I tried to execute it on command line:

sqlite> SELECT spectrum_id, feature_table_id
   ...> FROM spectrum AS s 
   ...> INNER JOIN feature AS f 
   ...> ON f.msrun_msrun_id = s.msrun_msrun_id 
   ...> INNER JOIN (SELECT feature_feature_table_id, min(rt) AS rtMin, max(rt) AS rtMax, min(mz) AS mzMin, max(mz) as mzMax 
   ...> FROM convexhull GROUP BY feature_feature_table_id) AS t 
   ...> ON t.feature_feature_table_id = f.feature_table_id 
   ...> WHERE s.msrun_msrun_id = 1
   ...> AND s.scan_start_time >= t.rtMin
   ...> AND s.scan_start_time <= t.rtMax
   ...> AND base_peak_mz >= t.mzMin
   ...> AND base_peak_mz <= t.mzMax;

update:

So I started running the query on the commandline about 45 minutes ago, and it's still busy, so it's also very slow using the commandline.

Community
  • 1
  • 1
Niek de Klein
  • 8,524
  • 20
  • 72
  • 143
  • How much time does the same query take when executing it through the client? –  May 02 '12 at 11:35
  • Also, what sqlite3 python module are you using? What version? What is the sqlite3 version used by the module? –  May 02 '12 at 11:44
  • I'm using sqlite module: 2.6.3 and sqlite version: 3.7.10. I'm trying to execute the command through SQLite manager but it doesn't seem to be able to cope with it. – Niek de Klein May 02 '12 at 12:36
  • Just use the client. Are you on Windows or UNIX/Linux? –  May 02 '12 at 12:39
  • I get an sql error: SQL error: malformed database schema - near "NO": syntax error – Niek de Klein May 02 '12 at 12:51
  • can you add that sql exactly as you tried to execute it? –  May 02 '12 at 12:52
  • just to be sure, what does `sqlite> .version` return? –  May 02 '12 at 12:58
  • sqlite> .version unknown command or invalid arguments: "version". Enter ".help" for help – Niek de Klein May 02 '12 at 12:59
  • When you start the client, what version does it print? –  May 02 '12 at 13:04
  • can you put your hands on a more recent client, perhaps the latest version (3.7.11)? –  May 02 '12 at 13:13
  • I'm running the 3.7.10 version with python by importing _sqlite.so (as described http://stackoverflow.com/a/1546162/651779). I have no idea how to run that via commandline, and I can't install a newer version in a different way. – Niek de Klein May 02 '12 at 13:34
  • You don't have to install, you can just grab/compile some binaries from http://sqlite.org/download.html, and use them as-is in you own home directory. The same goes for using some other python module. –  May 02 '12 at 13:38
  • Ok got it now, I'm running the query will update when it's finished – Niek de Klein May 02 '12 at 13:50
  • got the same problem. I could speed up the selection using indexing. But fetching data remain desperately slow even using fetchmany, apsw, sqlite3 version 3.11. The last alternative was to store each column into a binary file, load it with numpy.fromfile, and select data in python. – user2660966 Aug 04 '17 at 12:04

1 Answers1

2

From reading this question, it sounds like you could benefit from using the APSW sqlite module. Somehow you may be victim of your sqlite module causing your query to be executed in some less performant manner.

I was curious so I tried using apsw myself. It wasn't too complicated. Why don't you give it a try?

To install it I had to:

  1. Extract the latest version.
  2. Have the installation package fetch the latest sqlite amalgamation.

    python setup.py fetch --sqlite
    
  3. Build and install.

    sudo python setup.py install
    
  4. Use it in place of the other sqlite module.

    import apsw
    <...>
    conn = apsw.Connection('foo.db')
    
Community
  • 1
  • 1