I keep getting "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" warning message after I finished DASK code. I am using DASK doing a large seismic data computing. After the computing, I will write the computed data into disk. The writing to disk part takes much longer than computing. Before I wrote the data to the disk, I call client.close(), which I assume that I am done with DASK. But "distributed.utils_perf - WARNING - full garbage collections took 19% CPU time recently" keep coming. When I doing the computing, I got this warning message 3-4 times. But when I write the data to the disk, I got the warning every 1 sec. How can I get ride of this annoying warning? Thanks.
-
Did you manage to find a solution? It's 60% for me and code is very slow. – gerrit Jan 30 '20 at 16:26
-
I made Dask Cluster run in separate process. So when I am done with the computing, I let that process finish. It improved the memory issue a lot. – NSJ Feb 04 '20 at 02:45
-
@NSJ Can you explain how you seperated out the Dask Cluster? – takachanbo Jun 11 '20 at 07:09
-
@takachanbo You can start the scheduler in one process and then a client in another. When you start the client, pass the IP address of the scheduler to the constructor. See the docs (https://docs.dask.org/en/latest/setup/single-distributed.html) for an example of doing this one one machine using LocalCluster – user1993951 Oct 08 '20 at 15:03
-
This thread on the dask/distributed github has some useful comments on this warning: https://github.com/dask/distributed/issues/2801 – D.J. P. Dec 04 '21 at 02:08
4 Answers
same was happening with me in the Colab where we start the session
client = Client(n_workers = 40, threads_per_worker = 2 )
I terminate all my Colab sessions and installed and imported all the Dask libs
!pip install dask
!pip install cloudpickle
!pip install 'dask[dataframe]'
!pip install 'dask[complete]'
from dask.distributed import Client
import dask.dataframe as dd
import dask.multiprocessing
Now everything is working fine and not facing any issues. Don't know how this solved my issue :D

- 121
- 12
-
1
-
-
1...it did the trick for me, on my system. Just, confirming that your solution works. – Soerendip Dec 08 '22 at 19:24
I had been struggling with this warning too. I would get many of these warnings and then the workers would die. I was getting them because I had some custom python functions for aggregating my data together that was handling large python objects (dict). It makes sense so much time was being spent of garbage collection if I was creating these large objects.
I refactored my code so more computation was being done in parallel before they were aggregated together and the warnings went.
I looked at the progress chart on the status page of dask dashboard to see which tasks were taking along time to process (Dask tries to name the tasks after the function in your code which called them so that can help, but they're not always that descriptive). From there I could figure out which part of my code I needed to optimise.

- 249
- 2
- 10
You can disable garbage collection in Python.
gc.disable()
I found that it was easier to manage Dask worker memory through periodic usages of the Dask client restart: Client.restart()

- 126
- 1
- 3
Just create a process to run Dask cluster and return the ip address. Create the client using that ip address.

- 145
- 1
- 5
-
2Can you provide any limitations, assumptions or simplifications in your answer. See more details on how to answer at this link: https://stackoverflow.com/help/how-to-answer – Usama Abdulrehman Jun 14 '20 at 04:09
-
I wish I can provide more detail sample. But I moved to other project, and don't have the code and cluster environment any more. – NSJ Jun 19 '20 at 19:49