0

I have an application which requires to initialize a large number of objects with Python (3.5.2) and encounter some occasional slow-downs.

The slow-down seems to occur on a specific initialization: most of the calls to __init__ last less than 1 ns, but one of them sometimes lasts several dozens of seconds.

I've been able to reproduce this using the following snippet that initializes 500k a simple object.

import cProfile


class A:
    def __init__(self):
        pass


cProfile.run('[A() for _ in range(500000)]')

I'm running this code in a notebook. Most of the times (9/10), this code outputs the following (normal execution)

         500004 function calls in 0.675 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   500000    0.031    0.000    0.031    0.000 <ipython-input-5-634b77609653>:2(__init__)
        1    0.627    0.627    0.657    0.657 <string>:1(<listcomp>)
        1    0.018    0.018    0.675    0.675 <string>:1(<module>)
        1    0.000    0.000    0.675    0.675 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


The other times, it outputs the following (slow execution)

         500004 function calls in 40.154 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   500000    0.031    0.000    0.031    0.000 <ipython-input-74-634b77609653>:2(__init__)
        1   40.110   40.110   40.140   40.140 <string>:1(<listcomp>)
        1    0.014    0.014   40.154   40.154 <string>:1(<module>)
        1    0.000    0.000   40.154   40.154 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Using tqdm, the loop seems to get stuck on one iteration. It's important to note that I was able to reproduce this in a notebook with already a lot of memory allocated.

I suspect that it comes from the list of references to the objects used by the garbage collector that might need to be copied from time to time.

What is exactly happening here, and are there any ways to avoid this ?

rdbs
  • 31
  • 4
  • 1
    I'd say garbage collector is kicking in. Try disabling it (`import gc; gc.disable()`) and see if it helps. – Radosław Cybulski Jun 11 '19 at 09:24
  • I wonder of something in here may be relevant: https://stackoverflow.com/questions/311775/python-create-a-list-with-initial-capacity/24173567 – doctorlove Jun 11 '19 at 09:35
  • @RadosławCybulski indeed, doing this, I've not been able to reproduce it. – rdbs Jun 11 '19 at 10:29
  • @doctorlove not sure if I can allocate memory for lists used by the garbage collector – rdbs Jun 11 '19 at 10:31
  • You can disable garbage collector safely for some time. But if your application is going for long period of time, you'll have to enable gc back at one point or another and it will take long time again. I guess don't create million objects in python. ;) I'd suggest moving some part of the code into compiled language (for example C++ with boost.python). – Radosław Cybulski Jun 11 '19 at 11:13
  • This strategy of enabling/disabling garbage collector is well adapted to my problem, as the app needs to answer quickly on request but is idle most of the time. Thanks for the suggestion! Would be perfect to understand what happens though – rdbs Jun 11 '19 at 13:01

0 Answers0