How to reduce memory usage of threaded python code?

Question

I wrote about 50 classes that I use to connect and work with websites using mechanize and threading. They all work concurrently, but they don't depend on each other. So that means 1 class - 1 website - 1 thread. It's not particularly elegant solution, especially for managing the code, since lot of the code repeats in each class (but not nearly enough to make it into one class to pass arguments, as some sites may require additional processing of retrieved data in middle of methods - like 'login' - that others might not need). As I said, it's not elegant -- But it works. Needless to say I welcome all recommendations how to write this better without using 1 class for each website approach. Adding additional functionality or overall code management of each class is a daunting task.

However, I found out, that each thread takes about 8MB memory, so with 50 running threads we are looking at about 400MB usage. If it was running on my system I wouldn't have problem with that, but since it's running on a VPS with only 1GB memory, it's starting to be an issue. Can you tell me how to reduce the memory usage, or are there any other way to to work with multiple sites concurrently?

I used this quick test python program to test if it's the data stored in variables of my application that is using the memory, or something else. As you can see in following code, it's only processing sleep() function, yet each thread is using 8MB of memory.

from thread import start_new_thread
from time import sleep

def sleeper():
    try:
        while 1:
            sleep(10000)
    except:
        if running: raise

def test():
    global running
    n = 0
    running = True
    try:
        while 1:
            start_new_thread(sleeper, ())
            n += 1
            if not (n % 50):
                print n
    except Exception, e:
        running = False
        print 'Exception raised:', e
    print 'Biggest number of threads:', n

if __name__ == '__main__':
    test()

When I run this, the output is:

50
100
150
Exception raised: can't start new thread
Biggest number of threads: 188

And by removing running = False line, I can then measure free memory using free -m command in shell:

             total       used       free     shared    buffers     cached
Mem:          1536       1533          2          0          0          0
-/+ buffers/cache:       1533          2
Swap:            0          0          0

The actual calculation why I know it's taking about 8MB per thread is then simple by dividing dividing the difference of memory used before and during the the above test application is running, divided by maximum threads it managed to start.

It's probably only allocated memory, because by looking at top, the python process uses only about 0.6% of memory.

What's taking up the memory? I'd venture to guess that it's the data you're extracting from the sites. If that's the case, then there's probably not a lot that you could do short of throttling the number of executing threads. — Demian Brecht, Jan 09 '12 at 23:30
How do you exactly measure memory usage? I'd guess, that those 8MB are not really allocated to each single thread. A huge part of those 8MB may be shared between the threads (just a guess..)? — Frunsi, Jan 09 '12 at 23:34
Demian and frunsi, I edited my question to answer both of your questions. Thanks! — Gargauth, Jan 09 '12 at 23:39
Is this a hosting? what about `ulimit -u` ? and `ulimit -a`? — dani herrera, Jan 09 '12 at 23:52
@Andrew: So, you roughly measured the overhead of a single thread in python. After all, 8MB sounds reasonable these days ... — Frunsi, Jan 09 '12 at 23:54
@danihp: Before I started my question I found similar similar answer as you suggested here http://stackoverflow.com/questions/5636660/why-does-python-thread-consume-so-much-memory . I tried ulimit -s 1024, but I didn't mentioned any improvement, I still could run about 188 threads before memory being depleted. — Gargauth, Jan 09 '12 at 23:55
@danihp: Ah, so that's why it didn't work! :) I made a typo in that command. I just tried ulimit -s 1024 I could start roughly 1500 processes. — Gargauth, Jan 10 '12 at 00:02
Sure, you can :) I'll still wait what people can say about the problem I'm facing. Although this particular problem can be partially (and temporarily) solved by lowering the limit of stack size, it's not a great solution by far (and I'm not expert, but changing system setting for one python program might cause other side effects.) If there are other solutions people could point me to, like Frunsi proposed one, I welcome them :) Also as I wrote before, lowering lowering the stack size limit using `ulimit -s` command is only temporary per-session. — Gargauth, Jan 10 '12 at 00:19
@andrew, `ulimit -s` may be fixed whith [limit pam module](http://linux.die.net/man/8/pam_limits) they are a soft and hard value for each parameter. Also you can assign a custom limit for each user in bash.rc. For example take a look to Oracle documentation, [Oracle server needs to customize this parameters to work properly](http://docs.oracle.com/cd/E11882_01/install.112/e24326/toc.htm#BHCCADGD). — dani herrera, Jan 10 '12 at 08:25

score 6 · Accepted Answer · edited May 23 '17 at 11:45

6

use less threads: ThreadPoolExecutor example. Install futures on Python 2.x
try asynchronous approach:
- gevent example
- twisted example

edited May 23 '17 at 11:45

Community

1
1

answered Jan 10 '12 at 00:51

jfs

399,953
195
994
1,670

This. If resource management is an issue, just have a thread pool and tune the pool limit. – Giacomo Lacava Jan 10 '12 at 01:48
Thank you! It looks like Gevent is what I've been looking for. – Gargauth Jan 12 '12 at 12:39

score 2 · Answer 2 · answered Jan 09 '12 at 23:47

2

Using "one thread per request" is OK and easy for many use-cases. However, it will require a lot of ressources (as you experienced).

A better approach is to use an asynchronuous one, but unfortunately it is a lot more complex.

Some hints into this direction:

answered Jan 09 '12 at 23:47

Frunsi

7,099
5
36
42

Thanks, much appreciated. I read about Twisted before, but sadly I don't know much about it and by the looks of it I wouldn't be able to use mechanize with it. I'll take a look if I could make mechanize work with asyncore. – Gargauth Jan 10 '12 at 00:14
After all, a "perfect" solution would be a mix of thread pools with one thread per CPU core (to utilize them for processing tasks) and asynchronuous IO. A practical solution will depend on your actual application code. Maybe, even a simple solution based on `select` will do it for you. – Frunsi Jan 10 '12 at 00:35
1

This means: in your thread: send a bunch of requests, then enter a loop which will `select` on the appropriate sockets, and handle any incoming data one by one... and so on. After all, the OS cares about socket IO anyway, your task is to interface with the OS in the most efficient way possible. – Frunsi Jan 10 '12 at 00:39
Thing is, the code I have is quite simple really. Each subclass is rather same, with just different URLs, different names, values, etc and occasionally some different way of processing of the data. They do not depend on each other at all. All I want is to run them concurrently, wait for them to complete the work and then exit. All solutions I read are for more complex things I think. I can't believe someone haven't developed a module for simple asynchronous/threaded execution of classes or functions that do not depend on each other at all. – Gargauth Jan 10 '12 at 00:49
@Andrew: All the required code & framework exists, you just have to use it now ;) – Frunsi Jan 10 '12 at 01:17

David Schwartz · Answer 3 · 2014-07-24T15:48:59.410

The solution is to replace code like this:

1) Do something.
2) Wait for something to happen.
3) Do something else.

With code like this:

1) Do something.
2) Arrange it so that when something happens, something else gets done.
3) Done.

Somewhere else, you have a few threads that do this:

1) Wait for anything to happen.
2) Handle whatever happened.
3) Go to step 1.

In the first case, if you're waiting for 50 things to happen, you have 50 threads sitting around waiting for 50 things to happen. In the second case, you have one thread waiting around that will do whichever of those 50 things need to get done.

So, don't use a thread to wait for a single thing to happen. Instead, arrange it so that when that thing happens, some other thread will do whatever needs to get done next.

score 0 · Answer 4 · answered Jan 09 '12 at 23:30

I'm no expert on Python, but maybe have a few thread pools which control the total number of active threads, and hands off a 'request' to a thread once it's done with the previous thread. The request doesn't have to be the full thread object, just enough data to complete whatever the request is.

You could also structure it so you have thread pool A with N threads pinging the website, once the data is retrieved, hand it off the data to thread pool B with Y threads crunching the data.

How to reduce memory usage of threaded python code?

4 Answers4

Linked