3

I am writing a page to download millions of records from my database. I plan to run it in an environment where I will have limited memory. Therefore I want to stream CSV data. This code for some reason still uses a lot of memory and that memory is not freeded after I finish the download. What is causing this leak. My app goes from taking 30mb of memory to 2GB

@app.route('/admin/querydb', methods=['GET', 'POST'])
@admin_filter
def admin_query_db():

    if request.method == 'POST':
        query = model.DriverStop.query.join(model.DriverDailyRoute, model.Agency).join(model.User)

        if 'date_filter_start' in request.form:
            start = datetime.datetime.strptime(request.form['start_date'], '%Y-%m-%d')
            start -= datetime.timedelta(days=1)
            query = query.filter(model.DriverDailyRoute.date >= start)

        if 'date_filter_end' in request.form:
            end = datetime.datetime.strptime(request.form['end_date'], '%Y-%m-%d')
            query = query.filter(model.DriverDailyRoute.date < end)

        if not 'recipient' in request.form:
            query = query.filter(model.Agency.agency_type != model.Agency.RECIPIENT)

        if not 'donor' in request.form:
            query = query.filter(model.Agency.agency_type != model.Agency.DONOR)


        header = ['Username', 'Last Name', 'First Name', 'Agency Name', 
                  'Agency Type', 'City', 'Date', 'Time', 'Is Special Stop', 'Cargo Temperature',
                  'Prepared', 'Produce', 'Dairy', 'Raw Meat', 'Perishable', 'Dry Goods',
                  'Bread', 'Total']

        def csv_line(items):
            return ''.join(['"' + str(s).replace('"', '""') + '",' for s in items][:-1])

        def gen_csv():
            yield csv_line(header) + '\n'
            for q in query.all():
                yield csv_line([q.route.driver.username, q.route.driver.last_name, q.route.driver.first_name,
                         q.agency.name, q.agency.agency_type, q.agency.city, q.route.date, 
                         q.time, q.special_stop, q.cargo_temp, q.prepared, q.produce,
                         q.dairy, q.raw_meat, q.perishable, q.dry_goods, q.bread, q.total_up()]) + '\n'

        return Response(gen_csv(), mimetype='text/csv')

    drivers = model.User.query.filter_by(acct_type=model.User.DRIVER).order_by(model.User.active, model.User.last_name, model.User.first_name, model.User.username).all()
    agencies = model.Agency.query.order_by(model.Agency.active, model.Agency.name).all()
    return render_template('admin/dbquery.html', page_title='Database Query', drivers=drivers, agencies=agencies)

Some of my other pages also have this behavior where they do not free memory after a large query.

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
chasep255
  • 11,745
  • 8
  • 58
  • 115
  • 1) Python does not free memory instantly after it discards it - run garbage collector manual and then check the memory usage 2) Something in the code must trigger full memory buffering instead of streaming - here are hints for memory profiling in Python http://stackoverflow.com/a/552810/315168 – Mikko Ohtamaa Aug 10 '15 at 21:27
  • calling gc.collect does nothing to help. – chasep255 Aug 10 '15 at 21:44

1 Answers1

5

Calling Query.all() causes SQLAlchemy to load all the results from database query and convert them to list in memory. You should use Query.yield_per() to load data in batches.

Yaroslav Admin
  • 13,880
  • 6
  • 63
  • 83
  • Tried it. Makes no difference. – chasep255 Aug 10 '15 at 21:38
  • for q in query.yield_per(100).enable_eagerloads(False).options(lazyload('*')).all(): – chasep255 Aug 10 '15 at 21:38
  • Not sure if it's the reason, but you don't need to add `.all()` after `.yield_per()`. – Yaroslav Admin Aug 10 '15 at 21:45
  • Still not sure why the memory is retained after the function returns though – chasep255 Aug 10 '15 at 21:47
  • 1
    Hard to say... Try to profile your memory as you were advised and discover what kind of objects are occupying it. It may give you a hint on how to fix it. Maybe it's not garbage collected yet, when you check or something prevents your objects from being garbage collected (they bound to session for example). – Yaroslav Admin Aug 10 '15 at 21:56