I'm running a conversion script that commits large amounts of data to a db using Django's ORM. I use manual commit to speed up the process. I have hundreds of files to to commit, each file will create more than a million objects.
I'm using Windows 7 64bit. I noticed the Python process keeps growing until it consumes more than 800MB, and this is only for the first file!
The script loops over records in a text file, reusing the same variables and without accumulating any lists or tuples.
I read here that this is a general problem for Python (and perhaps for any program), but I was hoping perhaps Django or Python has some explicit way to reduce the process size...
Here's an overview of the code:
import sys,os
sys.path.append(r'D:\MyProject')
os.environ['DJANGO_SETTINGS_MODULE']='my_project.settings'
from django.core.management import setup_environ
from convert_to_db import settings
from convert_to_db.convert.models import Model1, Model2, Model3
setup_environ(settings)
from django.db import transaction
@transaction.commit_manually
def process_file(filename):
data_file = open(filename,'r')
model1, created = Model1.objects.get_or_create([some condition])
if created:
option.save()
while 1:
line = data_file.readline()
if line == '':
break
if not(input_row_i%5000):
transaction.commit()
line = line[:-1] # remove \n
elements = line.split(',')
d0 = elements[0]
d1 = elements[1]
d2 = elements[2]
model2, created = Model2.objects.get_or_create([some condition])
if created:
option.save()
model3 = Model3(d0=d0, d1=d1, d2=d2)
model3 .save()
data_file.close()
transaction.commit()
# Some code that calls process_file() per file