Split up your larger task into smaller pieces.
Step 1 - Just read a CSV file
Both ContactCSVModel.import_from_filename() and ContactCSVModel.import_from_file() return the csv lines. Disable the interaction with your django model to skip interaction with your database. This should speed up the task considerably and print the imported data. This should definitely work!
CSVModel
class ContactCSVModel(CsvModel):
first_name = CharField()
last_name = CharField()
company = CharField()
mobile = CharField()
group = DjangoModelField(Group)
contact_owner = DjangoModelField(User)
class Meta:
delimiter = "^"
Your code
def process(self):
self.date_start_processing = timezone.now()
try:
# Try and import CSV
lines = ContactCSVModel.import_data(data=self.filepath, extra_fields=[
{'value': self.group_id, 'position': 5},
{'value': self.uploaded_by.id, 'position': 6}])
print lines # or use logging
self._mark_processed(self.num_records)
except Exception as e:
self._mark_failed(unicode(e))
Step 2 - enable django model interaction BUT disable to check for existing items in DB.
Disable it because this feature enabled would query the DB for every line in CSV to check for existing items according to your natural key specification (I have read the source code). Probably you know that all lines in your CSV are unique contacts.
This would help if your problems are slow DB queries during the whole import, but does not really help if the import consumes too much memory.
class ContactCSVModel(CsvModel):
first_name = CharField()
last_name = CharField()
company = CharField()
mobile = CharField()
group = DjangoModelField(Group)
contact_owner = DjangoModelField(User)
class Meta:
delimiter = "^"
dbModel = Contact
Step 3 - Import equally sized chunks of CSV
Use the CSVModel and enable interaction with Contact model, but provide smaller iterables to ContactCSVModel.import_data(). I set it to 500. Change it to your needs. The code sample below (link) is to get you the idea. You need to change it a bit to put this into your existing code. This will help, if memory consumption is the problem.
import csv
reader = csv.reader(open(self.filepath, 'rb'))
def gen_chunks(reader, chunksize=100):
"""
Chunk generator. Take a CSV `reader` and yield
`chunksize` sized slices.
"""
chunk = []
for i, line in enumerate(reader):
if (i % chunksize == 0 and i > 0):
yield chunk
del chunk[:]
chunk.append(line)
yield chunk
for chunk in gen_chunks(reader, chunksize=500):
ContactCSVModel.import_data(data=chunk, extra_fields=[
{'value': self.group_id, 'position': 5},
{'value': self.uploaded_by.id, 'position': 6}])
Step 4 - Target large memory consumption and slow operation
Because django-adaptors holds all Contact model instances in memory during import and slow operation because of multiple single commits instead of bulk insert operation - it is not well suited for larger files.
You are somewhat tied to django-adaptors. You can't switch to bulk inserts if you rely on this django package. Check the memory consumption under linux with top or htop, on windows with task manager. If the process eats to much and the OS starts swapping, switch to another django add-on with more efficient memory consumption and bulk inserts as an option - there are plenty of them for csv imports.
Another hint is to use the csv module for reading and your django Models knowledge for interacting with the database. This is not really a challenge for you - just try it with isolated tasks of your big picture and put them together if they are working - good luck.