1

Writing a simple multi-processed database table archiver that dumps compressed data from a Postgres db to a Google Cloud Storage bucket.

Doing the memory profiling dance and noticed that objects that I'm deleting aren't actually being released from memory. Was hoping someone has an explanation or could point me to some material for better understanding of what's going on.

Python 3.7

Here's the code.

import gzip
import ETL.dbtools as db
from google.cloud import storage
from memory_profiler import *


@profile
def worker(table_name: str) -> None:
    """
    table_name: str name of table to be archived.
    """
    client = storage.Client()
    bucket = client.get_bucket('mein_bucket')
    blob = bucket.blob(table_name+'.csv.gzip')
    # // db.read_table simply reads data from PostgreSQL database
    # //    using sqlalchemy library into a pandas DataFrame
    df = db.read_table(table_name, schema='das_schema')
    csv_data = df.to_csv(index=False).encode('utf-8')
    del df
    compressed_csv_data = gzip.compress(csv_data)
    del csv_data
    blob.upload_from_string(compressed_csv_data)

And the profile output.

Line #    Mem usage    Increment   Line Contents
================================================
     7    107.6 MiB    107.6 MiB   @profile
     8                             def worker(table_name: str) -> None:
     9                                 """
    10                                 table_name: str name of table to be archived.
    11                                 """
    12    107.8 MiB      0.1 MiB       client = storage.Client()
    13    111.4 MiB      3.6 MiB       bucket = client.get_bucket('mein_bucket')
    14    111.4 MiB      0.0 MiB       blob = bucket.blob(table_name+'.csv.gzip')
    15                                 # // db.read_table simply reads data from PostgreSQL database
    16                                 # //    using sqlalchemy library into a pandas DataFrame
    17    414.0 MiB    302.6 MiB       df = db.read_table(table_name, schema='das_schema')
    18    589.5 MiB    175.5 MiB       csv_data = df.to_csv(index=False).encode('utf-8')
    19    589.5 MiB      0.0 MiB       del df
    20    566.5 MiB      0.0 MiB       compressed_csv_data = gzip.compress(csv_data)
    21    566.5 MiB      0.0 MiB       del csv_data
    22    566.5 MiB      0.0 MiB       blob.upload_from_string(compressed_csv_data)
D_Vandagriff
  • 63
  • 1
  • 5
  • Do not you have to delete compressed_csv_data as well so references in csv_data and df could be removed? – MasterOfTheHouse Jan 14 '20 at 18:25
  • 1
    **`del` does not delete objects**. It deletes *names*. Indeed, since those names are local, they will automatically be deleted when the function terminates, making those `del` statements pretty pointless. – juanpa.arrivillaga Jan 14 '20 at 18:39

0 Answers0