0

I am trying to run a script to extract some information from a pandas dataframe and save images containing a subset of such dataframe, together with an image read from a folder.

The script runs fine for a number of images smaller than 700. For greater amounts of images, the scripts gets killed due to Out of Memory (detected through dmesg command on Linux terminal after "Killed" exit message from python process).

The script looks like this (I tried closing figure and deleting variables at the end of the loop, but it didn't help):

import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib import gridspec
import pandas as pd
import numpy as np

# READ CSV ~20k lines, not an issue for memory
df = pd.read_csv('20220221-1516_export.csv')
images = os.listdir('./slices') # list of images contained in the folder
gs  = gridspec.GridSpec(2, 3, width_ratios=[1, 3, 1], height_ratios=[3.5, 1])

for image_name in images:

    image_df = df[df.filename == image_name][['label', 'grader1', 'grader2', 'grader3', 'grader4', 'grader5']]
    image_df = image_df[~image_df.filter(like='grader').apply(set, axis=1).isin([{False, np.nan}, {False}])].set_index('label') 

    fig = plt.figure(figsize=(10, 7))

    ax0 = fig.add_subplot(gs[1])
    img = mpimg.imread(os.path.join('slices', image_name))
    ax0.imshow(img, cmap='gray')

    ax1 = fig.add_subplot(gs[3:6])
    ax1.axis('tight')
    ax1.axis('off')
    colors = image_df.applymap(lambda x: '#9BCA3E' if x == True else ('#ED5314' if x == False else '#C5C7D8'))

    ax1.table(cellText=image_df.values, colLabels=image_df.columns, rowLabels=image_df.index, loc='center', cellColours=colors.values)

    fig.tight_layout()
    fig.savefig(os.path.join('output', image_name))
    
    # I tried to close figures and delete variables but nothing changes
    plt.close(fig)
    del image_df, fig, ax0, ax1, colors, img

The csv file is composed of 20 thousand lines similar, below a part of it:

,filename,label,grader1,grader2,grader3,grader4,grader5
0,98c0c8fe7f17477da6620054936871cd.png,label1,,False,False,,False
1,98c0c8fe7f17477da6620054936871cd.png,label2,,False,False,,False
2,98c0c8fe7f17477da6620054936871cd.png,label3,,False,False,,False
3,98c0c8fe7f17477da6620054936871cd.png,label4,,False,False,,False
4,98c0c8fe7f17477da6620054936871cd.png,label5,,False,False,,False
5,98c0c8fe7f17477da6620054936871cd.png,label8,,False,False,,False
6,98c0c8fe7f17477da6620054936871cd.png,label9,,False,False,,False
7,98c0c8fe7f17477da6620054936871cd.png,label10,,False,False,,False
...
14,e369b623efbe4fae8efaf5d61d47b7cd.png,label8,False,False,,,False
15,e369b623efbe4fae8efaf5d61d47b7cd.png,label9,False,False,,,False
16,e369b623efbe4fae8efaf5d61d47b7cd.png,label10,False,False,,,False

From my understanding, the memory should not increase with the number of images, as everything (variables and files) get deleted/closed at the end of each loop. What am I doing wrong and what could be the cause of the Out of Memory issue?

Following posts did not help with my issue:

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
scandav
  • 749
  • 1
  • 7
  • 21
  • would adding `ax0.clear()`, `ax1.clear()`, `fig.clear()` before `plt.close(fig)` helps? – Raymond Kwok Feb 21 '22 at 16:49
  • 1
    if it doesn't help, perhaps you can try putting the creations of `fig`, `ax0` and `ax1` outside of the loop, re-use them throughout the loop? – Raymond Kwok Feb 21 '22 at 16:56
  • Clearing axes and figure before closing the figure didn't help. The second solution you proposed helped: create `fig`, `ax0`, `ax1` once outside the loop and add `ax0.clear()` and `ax1.clear()` before `plt.close(fig)`. Thanks! If you want to post it as an answer, I'll accept it – scandav Feb 22 '22 at 09:00
  • I appreciate that :) I was just brainstorming but I actually don't know about the underlying problem behind this memory issue. Hope that someone who knows better will post an answer in the future. – Raymond Kwok Feb 22 '22 at 09:47

0 Answers0