2

I am trying to read a 200 MB csv file using SQLalchemy. Each line has about 30 columns, of which, I use only 8 columns using the code below. However, the code runs really slow! Is there a way to improve this? I would like to use map/list comprehension or other techniques. As you an tell, I am a newbie. Thanks for your help.

for ddata in dread:        
    record = DailyData()
    record.set_campaign_params(pdata) #Pdata is assigned in the previous step         
    record.set_daily_data(ddata) #data is sent to a class method where only 8 of 30 items in the list are used       
    session.add(record)
    session.commit() #writing to the SQL database.  
user1347835
  • 23
  • 1
  • 3

1 Answers1

4

don't commit on every record. commit or just flush every 1000 or so:

for i, data in enumerate(csv_stuff):
    rec = MyORMObject()
    rec.set_stuff(data)
    session.add(rec)
    if i % 1000 == 0:
        session.flush()
session.commit() # flushes everything remaining + commits

if that's still giving you problems then do some basic profiling, see my post at How can I profile a SQLAlchemy powered application?

Community
  • 1
  • 1
zzzeek
  • 72,307
  • 23
  • 193
  • 185
  • Thank you! This resulted in a big boost in speed. I will also use profiler to educate myself more on SQLalchemy optimization – user1347835 Apr 23 '12 at 20:37