0

I am developing a web application with an SQLAlchemy ORM linked with PostgreSQL database backend. I have this usecase where a client uploads a file of about 20 MB containing large data. What I am trying to do is,run a series of select, insert and update commands on about a million entries parsed from the uploaded file against the database.

I am on Ubuntu 12.0 with a 8 GB RAM, and whenever I perform this massive insert update operation, my memory hits about 90% usage and freezes at about half way through the inserts and updates.

During this period, nothing gets inserted or updated. Nothing happens at all and unless I kill this process, it takes a toll on my system's performance.

I have 3 tables. Two contain data and the other one is a junction table containing the foreign keys of the first two tables, and a column specifying the number of rows inserted in the first two tables. I do NOT have indexes specified, and Autocommit is off.

I tried looking up bulk insertions and suggestions that said

"COPY is better than INSERT", "take your DB offline and insert" and deferred Foreign key processings.

I checked this Please let me know if there are any ways to chunk my inserts so that I can serve other requests while the DB is occupied with writes. Else, my whole web application is frozen with this specific upload operation. Thanks in advance.

EDIT

I am using the ZOPE transaction manager and my transactions get committed only at the end of all inserts and updates. Until then I issue only multiple DBSession.add() commands but no commits. Can this be the issue? Do my session objects eat all the RAM? I got this doubt after reading this in the SQLAlchemy docs:

As a general rule, keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data.

Community
  • 1
  • 1
Tania
  • 1,855
  • 1
  • 15
  • 38
  • 1
    Check `pg_stat_activity` and `pg_locks` if you have some blocking sessions. –  Apr 22 '15 at 13:31
  • Maybe it's the parsing of the uploaded file which is already fully loading the system? – splash Apr 22 '15 at 13:39
  • I checked pg_locks. No locks. pg_stat_activity says the transaction is happening – Tania Apr 22 '15 at 13:42
  • No. It just reads rows from a csv file that exists locally and performs this operations. Files < 2 MB are gracefully processed. – Tania Apr 22 '15 at 13:43
  • What exactly do the columns `xact_start`, `query_start`, `state_change`, `waiting`, `state` and `query` (in `pg_stat_activity`) show you for that connection? How many lines does the file contain? –  Apr 22 '15 at 14:04

0 Answers0