-3

I am extracting some data to a csv file using python, the data is over 1 million records. Definitely there seems to be memory issues with my script because after a painstaking 5 hours and roughly over 190k records written the scripts running process gets killed.

here is my terminal

(.venv)[cv1@mdecv01 maidea]$ python common_scripts/script_tests/ben-test-extract.py BEN
Generating CSV file. Please wait ...
Preparing to write file: BEN-data-20170731.csv
Killed
(.venv)[cv1@mdecv01 maidea]$

is their a way i can extract this data with proper memory management?

here is my script

James Z
  • 12,209
  • 10
  • 24
  • 44
Edward Okech
  • 151
  • 3
  • 9

2 Answers2

1

You are not taking advantage of select_related or prefetch_related. If you do not use these two methods you will end up performing database calls every time you access a related field (ForeignKey, ManyToManyField)

for beneficiary in Beneficiary.objects.all():
    if beneficiary.is_active:
        household = beneficiary.household
        if len(beneficiary.enrolments) > 0 and len(beneficiary.interventions) > 1:

Should be something like this

for beneficiary in Beneficiary.objects.select_related(
    'household'
).prefetch_related(
    'enrolments',
    'interventions'
):
    if beneficiary.is_active:
        household = beneficiary.household
        if len(beneficiary.enrolments.all()) > 0 and len(beneficiary.interventions.all()) > 1:
Iain Shelvington
  • 31,030
  • 3
  • 31
  • 50
0
  • Filter in queryset instead of pulling all data for example .filter(is_active=true) , filter by count for example annotate(interventions_count=Count('interventions')).filter(interventions_count__gte=1)
  • Pull data in iterations with offset and limit other than pulling it all at once [from (smaller memory consuption) [0:100]
  • Make use of select_related and prefetch_related to pre-select tables you need
iklinac
  • 14,944
  • 4
  • 28
  • 30