I am developing a web application with an SQLAlchemy ORM linked with PostgreSQL database backend. I have this usecase where a client uploads a file of about 20 MB containing large data. What I am trying to do is,run a series of select, insert and update commands on about a million entries parsed from the uploaded file against the database.
I am on Ubuntu 12.0 with a 8 GB RAM, and whenever I perform this massive insert update operation, my memory hits about 90% usage and freezes at about half way through the inserts and updates.
During this period, nothing gets inserted or updated. Nothing happens at all and unless I kill this process, it takes a toll on my system's performance.
I have 3 tables. Two contain data and the other one is a junction table containing the foreign keys of the first two tables, and a column specifying the number of rows inserted in the first two tables. I do NOT have indexes specified, and Autocommit is off.
I tried looking up bulk insertions and suggestions that said
"COPY is better than INSERT", "take your DB offline and insert" and deferred Foreign key processings.
I checked this Please let me know if there are any ways to chunk my inserts so that I can serve other requests while the DB is occupied with writes. Else, my whole web application is frozen with this specific upload operation. Thanks in advance.
EDIT
I am using the ZOPE transaction manager and my transactions get committed only at the end of all inserts and updates. Until then I issue only multiple DBSession.add() commands but no commits. Can this be the issue? Do my session objects eat all the RAM? I got this doubt after reading this in the SQLAlchemy docs:
As a general rule, keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data.