4

I implemented in python a program that builds some db from protein fasta file. A fasta file is a text file that contains some macromolecular sequences. You can read more about here. From each protein my program generate a list of peptides, they are pieces of proteins. For my goals the program builds and interrogates a DB in SQLite.

Do you know if there are many tricks to populate or interrogate a sqlite db faster in python? If I use layers or ORMs like SQLAlchemy, can I improve the performance?

gunzapper
  • 457
  • 7
  • 19
  • 1
    SQLAlchemy/SQLObject allow you to sorta-transparently use databases (not just SQLite) without having to know the SQL used to query. Internally they could be generating slower queries than hand-crafted ones (but I'm not very familiar with their internals). If you want to write fast-performing queries, you might have to learn SQL. But have you found that the performance in your queries is really lacking? SQLite has a 'query optimization' guide: http://www.sqlite.org/optoverview.html. Also, sqlite has a `memory` mode. – wkl Sep 23 '11 at 13:41
  • you could try running under [pypy](http://pypy.org/) – jterrace Sep 23 '11 at 13:46
  • 1
    Why are you using a relational database in the first place? Why not create dictionaries in memory? How big is this database? If it's under 2Gb, then in-memory pure-Python objects will be far, far faster than SQLite. – S.Lott Sep 23 '11 at 14:40
  • @S.Lott Many times the databases that the program build are greater than 2Gb. I need or to optimizing either to understand if there are alternatives. I also can serialize the dictionary or using JSON to make a simple DB, like this [receipt] (http://code.activestate.com/recipes/576642/) that I love. – gunzapper Sep 23 '11 at 15:04
  • Without details, it's impossible to suggest optimizations. However, partitioning the data and running two, parallel, in-memory processes would probably be much faster than messing around with SQL. – S.Lott Sep 23 '11 at 16:57
  • 1
    gunzapper: are you actually storing string slices from the fasta file in a dict? You can just store the beginning and end points to save a lot of space if you do things smartly. Of course it depends on what you are doing. Also, you there might be other ways to store the data depending on what you doing. For instance, you might want some sort of tree structure. – Justin Peel Sep 23 '11 at 20:27

1 Answers1

4

You won't improve performance, the ORM layer will add a bit of overhead, but you'll probably be able to experiment easier. Given the size of your data you might be able to create the database in memory which would be faster. Or you might want to look at a different type of database, like redis for example.

Zach Kelling
  • 52,505
  • 13
  • 109
  • 108
  • 2
    Since I ran out of comment space in his question, and you mentioned sqlite3's `:memory:` mode, here's a related thread on getting performance out of `:memory:`: http://stackoverflow.com/questions/764710/sqlite-performance-benchmark-why-is-memory-so-slow-only-1-5x-as-fast-as-di – wkl Sep 23 '11 at 13:44
  • @zeekay: To much data to insert... no `:memory:` is no the better solution in my chase, but I'm interested to redis. It's look nice for the moment. – gunzapper Sep 23 '11 at 14:57