I have written some sample script that create your database and go through all its elements. And it looks like it works much faster than your wrote in comments. Are you sure that database access is a bottleneck? Maybe in your script you do something more and this takes so much time.
I have checked 2 databases SQLite and MongoDB with 5 millions of items.
For SQLite inserting all rows took ~1200 seconds and selection them around 300 seconds.
MongoDB was faster and insert took ~400 seconds while select less than 100 seconds.
Please check your code with my samples and check if your select is similar. I used cursor instead of LIMIT/OFFSET.
If this still doesn't help then I think MongoDB is worth a shot. It has one disadvantage - it require 64-bit OS to support large database (like yours). If you newer used it before then here is shortest installation guide for windows:
And here are my python 3.x test scripts for SQLite
import sqlite3
from time import time
conn = sqlite3.connect('test.dbase')
c = conn.cursor()
c.execute("""PRAGMA journal_mode = OFF""")
c.execute("""PRAGMA synchronous = 0""")
c.execute("""PRAGMA locking_mode = EXCLUSIVE""")
c.execute("""PRAGMA count_change = OFF""")
c.execute("""PRAGMA temp_store = 2""")
c.execute("""CREATE TABLE tmpTranscripts_arm_3R_transcripts (id INTEGER PRIMARY KEY, name varchar(255), chromosome varchar(255), start int(11), end int(11), direction tinyint(4), tags varchar(1023), bin int(11), exons varchar(10000))""")
c.execute("""CREATE INDEX 'iTranscript_arm_3R_14943' ON 'tmpTranscripts_arm_3R_transcripts' (start, end, direction)""")
t1 = time()
for i in range(0, 5000000):
c.execute("""INSERT INTO tmpTranscripts_arm_3R_transcripts (name, chromosome, start, end, direction, tags, bin, exons) VALUES ('SRR060644.1', 'arm_3R', %d, %d, %d, 'feature=transcript;bestRegion=(self);nbGaps=0;nbMismatches=0;ID=SRR060644.1;identity=100.0', 300011450, '')""" % ((i+123)%352, (i+523)%422, (i+866)%536))
if(not i%10000):
print("Insert:", i)
t2 = time()
print("Insert time", t2-t1)
conn.commit()
t1 = time()
c.execute("""SELECT * FROM tmpTranscripts_arm_3R_transcripts ORDER BY start, end, direction""")
i = 0
for row in c:
a = row[0]
if(not i%10000):
print("Get:", i, row)
i+=1
t2 = time()
print("Sort time", t2-t1)
c.close()
and for MongoDB
from pymongo import Connection
from pymongo import ASCENDING, DESCENDING
from time import time
connection = Connection()
connection = Connection('localhost', 27017)
db = connection['test-database']
collection = db['test-collection']
posts = db.posts
posts.create_index([("start", ASCENDING), ("end", ASCENDING), ("direction", ASCENDING)])
t1 = time()
for i in range(0, 5000000):
post = { "name": 'SRR060644.1',
"chromosome": 'arm_3R',
"start": (i+123)%352,
"end": (i+523)%422,
"direction": (i+866)%536,
"tags": 'feature=transcript;bestRegion=(self);nbGaps=0;nbMismatches=0;ID=SRR060644.1;identity=100.0',
"bin": 300011450,
"exons": ''}
posts.insert(post)
if(not i%10000):
print("Insert:", i)
t2 = time()
print("Insert time", t2-t1)
t1 = time()
i = 0
for post in posts.find().sort([("start", ASCENDING), ("end", ASCENDING), ("direction", ASCENDING)]):
if(not i%10000):
print("Get:", i, post)
i+=1
t2 = time()
print("Sort time", t2-t1)