Writing a simple multi-processed database table archiver that dumps compressed data from a Postgres db to a Google Cloud Storage bucket.
Doing the memory profiling dance and noticed that objects that I'm deleting aren't actually being released from memory. Was hoping someone has an explanation or could point me to some material for better understanding of what's going on.
Python 3.7
Here's the code.
import gzip
import ETL.dbtools as db
from google.cloud import storage
from memory_profiler import *
@profile
def worker(table_name: str) -> None:
"""
table_name: str name of table to be archived.
"""
client = storage.Client()
bucket = client.get_bucket('mein_bucket')
blob = bucket.blob(table_name+'.csv.gzip')
# // db.read_table simply reads data from PostgreSQL database
# // using sqlalchemy library into a pandas DataFrame
df = db.read_table(table_name, schema='das_schema')
csv_data = df.to_csv(index=False).encode('utf-8')
del df
compressed_csv_data = gzip.compress(csv_data)
del csv_data
blob.upload_from_string(compressed_csv_data)
And the profile output.
Line # Mem usage Increment Line Contents
================================================
7 107.6 MiB 107.6 MiB @profile
8 def worker(table_name: str) -> None:
9 """
10 table_name: str name of table to be archived.
11 """
12 107.8 MiB 0.1 MiB client = storage.Client()
13 111.4 MiB 3.6 MiB bucket = client.get_bucket('mein_bucket')
14 111.4 MiB 0.0 MiB blob = bucket.blob(table_name+'.csv.gzip')
15 # // db.read_table simply reads data from PostgreSQL database
16 # // using sqlalchemy library into a pandas DataFrame
17 414.0 MiB 302.6 MiB df = db.read_table(table_name, schema='das_schema')
18 589.5 MiB 175.5 MiB csv_data = df.to_csv(index=False).encode('utf-8')
19 589.5 MiB 0.0 MiB del df
20 566.5 MiB 0.0 MiB compressed_csv_data = gzip.compress(csv_data)
21 566.5 MiB 0.0 MiB del csv_data
22 566.5 MiB 0.0 MiB blob.upload_from_string(compressed_csv_data)