As my computer memory (in a digital environment, sandbox) was only 8gbs (now 14), I am trying to make my script more efficient! It is a script to analyze pictures and it works perfectly, however, at some point I ran into the infamous memory error. Basically I run a big loop to analyze some pictures, and at the end of each loop I perform the following functions to clear out the memory:
del huge amount of variables
plt.close()
gc.collect()
Each of these bits of codes improved my script. Without them I could analyze ~2 pictures, and with these bits of codes I can process around 15 pictures, but then I still run into a memory error. Why this is still happening is a much more complicated question from what I understand, but at this point I am more focused on the solution.
After much troubleshooting, I found that, in particular, one function in my script is very memory expensive. According to some pages on stackoverflow, I should be able to make the script perform better if I am able to subprocess this piece of code. Unfortunately I am not too familiar with programming and I have reached the point where I see no more progress and am forced to get some help.
I have tried to parse data between the two scripts, which seems to be a problem as it is an array an not a string. Furthermore, I was able to write the problematic line in a second script and open it directly in the first script, however, the idea of a subprocess is that the code runs and closes, making it less memory extensive..
I will not share the entire script as there are many pre-processing, processing and data harvesting lines that I think are not necessary to solve the problem. The major problem is at line #4
from plantcv import plantcv as pcv
import os
from os import listdir
import gc
import subprocess
from subprocess import Popen, PIPE
# In[2]:
list=os.listdir("directory with pictures")
# In[3]:
for x in list:
img, path, filename = pcv.readimage("directory with pictures" +x)
# In[4] HEAVY LINE IN CODE!:
mask_naive = pcv.naive_bayes_classifier(img, pdf_file="classifier model")
# many processing steps on mask_naive variable
# In[78]:with open('csv file being updated at end of loop', mode='a') as employee_file:
employee_writer = csv.writer(employee_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
# In[79]: pcv.print_image(wanted_img_variable, "output directory + x + ".png")
del huge amount of variables
plt.close()
gc.collect()
In order to avoid this line 4 in my code, I have tried to write a second .py script in a few ways of which these are two examples:
from first_script_name import img
mask_naive = pcv.naive_bayes_classifier(img, pdf_file="classifier model")
pass
and
from plantcv import plantcv as pcv
from __main__ import img
mask_naive = pcv.naive_bayes_classifier(img, pdf_file="classifier model")
In first.py I have tried instead of line #4 the following, and many other probably not so successful codes
p1= subprocess.run("second.py", shell=True, input=img, stdout=subprocess.PIPE, text=True, check=True)
It has to be noted that this script is normally run into a sandbox environment. When I do this and I look at the memory it gives the following output: memory usage sandbox The problem of the memoryerror also arises in the sandbox environment only. Every loop the memory usage slowly increases by approximately 0.15Gb until it reaches the maximum.
When I run the script in my home environment I get the following memory usage: memory usage home environment Where it does seem to increase, but overall it stays stable, and the script can run infinite loops without problems.
I am not too familiar with memory management in a sandbox environment but I think this could be playing a role as well. The desired outcome would be to not have the memory error anymore. Who can guide me in the right direction?
Many thanks.