Increasing memory limit in Python?

Question

I am currently using a function making extremely long dictionaries (used to compare DNA strings) and sometimes I'm getting MemoryError. Is there a way to allot more memory to Python so it can deal with more data at once?

Python 64-bits have **a lot more** memory support. I would say real numbers but I don't remember (I see this in a StackOverflow Question.) — Ender Look, Jun 12 '17 at 20:36
I'm comparing strings in lengths of 3-5 million characters, in the process creating a dictionary for each containing roughly as many keys as its length. Does that count as a lot? — , Jun 12 '17 at 20:40
@Maor That is definitely a lot. You should consider refactoring your code. — cs95, Jun 12 '17 at 20:41
Hey, if it's DNA, then how these dictionaries have so many keys? — enedil, Jun 12 '17 at 20:45
How much RAM are you working with? Can you add details about the data *in the question itself* instead of in the comments? Elaborate a bit more. If if it is a 32bit version of Python, you might benefit greatly by going 64bit. Depends. — juanpa.arrivillaga, Jun 12 '17 at 20:49
Bear in mind that Python objects incur some memory overhead on top of the "raw" data size. An empty string in 32 bit Python 3 consumes 25 bytes, each additional ASCII char will add 1 byte. If you use `bytes` strings instead the cost of an empty `b''` drops to 17 bytes. You can get this info via the `sys.getsizeof` function. Python 3.6 dicts are more space-efficient than previous versions, but they still have some unavoidable overheads. — PM 2Ring, Jun 12 '17 at 21:04
python libraries like [resource](https://docs.python.org/3/library/resource.html) _can_ impose a limit though. — matanster, Jun 07 '19 at 17:31

score 32 · Answer 1 · answered Jun 12 '17 at 20:39

32

Python doesn’t limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource module, but it isn't what you're looking for.

You'd need to look at making your code more memory/performance friendly.

answered Jun 12 '17 at 20:39

cs95

379,657
97
704
746

1

Or until the limit in the OS is reached (e.g. on linux you can easily impose limits via configuration) – matanster Jun 07 '19 at 17:31

score 1 · Answer 2 · edited Feb 17 '20 at 16:08

Python has MomeoryError which is the limit of your System RAM util you've defined it manually with resource package.

Defining your class with slots makes the python interpreter know that the attributes/members of your class are fixed. And can lead to significant memory savings!

You can reduce dict creation by python interpreter by using __slot__ . This will tell interpreter to not create dict internally and reuse same variable.

If the memory consumed by your python processes will continue to grow with time. This seems to be a combination of:

How the C memory allocator in Python works. This is essentially memory fragmentation, because the allocation cannot call ‘free’ unless the entire memory chunk is unused. But the memory chunk usage is usually not perfectly aligned to the objects that you are creating and using.
Using a number of small string to compare data. A process called interning used internally but creating multiple small strings brings load on interpreter.

The best way is to create Worker Thread or single threaded pool to do your work and invalidate worker/kill to free up resources attached/used in worker thread.

Below code creates single thread worker :

__slot__ = ('dna1','dna2','lock','errorResultMap')
lock = threading.Lock()
errorResultMap = []
def process_dna_compare(dna1, dna2):
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
        futures = {executor.submit(getDnaDict, lock, dna_key): dna_key for dna_key in dna1}
    '''max_workers=1 will create single threadpool'''
    dna_differences_map={}
    count = 0
    dna_processed = False;
    for future in concurrent.futures.as_completed(futures):
        result_dict = future.result()
        if result_dict :
            count += 1
            '''Do your processing XYZ here'''
    logger.info('Total dna keys processed ' + str(count))

def getDnaDict(lock,dna_key):
    '''process dna_key here and return item'''
    try:
        dataItem = item[0]
        return dataItem
    except:
        lock.acquire()
        errorResultMap.append({'dna_key_1': '', 'dna_key_2': dna_key_2, 'dna_key_3': dna_key_3,
                          'dna_key_4': 'No data for dna found'})
        lock.release()
        logger.error('Error in processing dna :'+ dna_key)
    pass

if __name__ == "__main__":
    dna1 = '''get data for dna1'''
    dna2 = '''get data for dna2'''
    process_dna_compare(dna1,dna2)
    if errorResultMap != []:
       ''' print or write to file the errorResultMap'''

Below code will help you understand memory usage :

import objgraph
import random
import inspect

class Dna(object):
    def __init__(self):
        self.val = None
    def __str__(self):
        return "dna – val: {0}".format(self.val)

def f():
    l = []
    for i in range(3):
        dna = Dna()
        #print “id of dna: {0}”.format(id(dna))
        #print “dna is: {0}”.format(dna)
        l.append(dna)
    return l

def main():
    d = {}
    l = f()
    d['k'] = l
    print("list l has {0} objects of type Dna()".format(len(l)))
    objgraph.show_most_common_types()
    objgraph.show_backrefs(random.choice(objgraph.by_type('Dna')),
    filename="dna_refs.png")

    objgraph.show_refs(d, filename='myDna-image.png')

if __name__ == "__main__":
    main()

Output for memory usage :

list l has 3 objects of type Dna()
function                   2021
wrapper_descriptor         1072
dict                       998
method_descriptor          778
builtin_function_or_method 759
tuple                      667
weakref                    577
getset_descriptor          396
member_descriptor          296
type                       180

More read on slots please visit : https://elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/

phnghue · Answer 3 · 2023-03-14T04:46:32.457

Although Python doesn’t limit memory usage on your program, the OS system has a dynamic CPU and RAM limit for every program for a good power and performance of a whole machine.

I work on graphic fractal generate and it required as much billion characters array as possible for speed generate. And I found out there was a soft limit per one python program set by OS that decrease the real performance of the algorithms on machine.

When you increase RAM memory and CPU usage (buffer, loop, thread), the total speed of process decrease. The useful thread quantity and thread frequency decrease, it cause the result took longer time to process. But, if you reduced the resource usage to 50...75% of the old config (smaller buffer size, smaller loop, less threads, lower frequency or longher threading timer ) and split the task to multiple part then run multiple console python program to process all task parts in the same time. It will took much less time to finish, and when you check the CPU and RAM usage, it will reach much more higher the old method single python program multi thread.
It's mean when we make a program dealing with high performance and speed and giant data processing, we need to design the program that has multiple background program and each background program has multiple thread running. Also, combine disk space to the process, rather than use memory only.
Even when you reach max physical memory, dealing with giant data at once will slower and limit application than dealing with giant data by splitting it to several part.
Optional, if your program is a special application, please consider to:
1. Apply Graphic card computing ability to the program, use OpenGL and graphic accelerate for Intel graphic or CUDA for NVIDIA.
2. Switch to OS and python 64 bit
3. Switch back to older operating system (but not too old): Windows 7 64bit or older Ubuntu, linux . Because higher OS has more costly luxury features and service consume computer resource. You will see a dramatically improve python ability and operation speed when using SSD drive and switch from windows 11,10 back to 7.
4. Switch to Safe Mode if you use Windows.
By create multiple background programs, it's allow you to create high speed graphic application that not require user to install graphic card. This is also a good choice if you want your program easy distribute to end user that integrate AI and costly compute power. It's cheaper to upgrade RAM in the range 4-16GB than a Graphic card.

    ##//////////////////////////////////
    ##////////// RUNTIME PACK //////////  FOR PYTHON
    ##//
    ##//                                  v2021.08.12 : add Handler
    ##//
    ##//  Module by Phung Phan: phnghue@gmail.com
    ##
    ##
    import time;
    import threading;
    # var Handler = function(){this.post = function(r){r();};return this;}; Handler.post = function(r){r();}

    ERUN=lambda:0;

    def run(R): R();
    def Run(R): R();
    def RUN(R): R();

    def delay(ms):time.sleep(ms/1000.0); # Control while speed
    def delayF(R,delayMS):
        t=threading.Timer(delayMS/1000.0,R)
        t.start();
        return t;
    def setTimeout(R,delayMS):
        t=threading.Timer(delayMS/1000.0,R)
        t.start();
        return t;
        
    class THREAD:
        def __init__(this):
            this.R_onRun=None;
            this.thread=None;
        def run(this):
            this.thread=threading.Thread(target=this.R_onRun);
            this.thread.start();
        def isRun(this): return this.thread.isAlive();

    AInterval=[];
    class setInterval :
        def __init__(this,R_onRun,msInterval) :
            this.ms=msInterval;
            this.R_onRun=R_onRun;
            this.kStop=False;
            this.kPause=False;
            this.thread=THREAD();
            this.thread.R_onRun=this.Clock;
            this.thread.run();
            this.id=len(AInterval); AInterval.append(this);
        def Clock(this) :
            while not this.kPause :
                this.R_onRun();
                delay(this.ms);
        def pause(this) :
            this.kPause=True;
        def stop(this) :
            this.kPause=True;
            this.kStop=True;
            AInterval[this.id]=None;
        def resume(this) :
            if (this.kPause and not this.kStop) :
                this.kPause=False;
                this.thread.run();
        
    def clearInterval(timer): timer.stop();
    def clearAllInterval():
        for i in AInterval:
            if i!=null: i.stop();

    def cycleF(R_onRun,msInterval):return setInterval(R_onRun,msInterval);
    def stopCycleF(timer):
        if not Instanceof(timer,"String"):
            try: timer.stop();
            except:pass;

    ##########
    ## END ### RUNTIME PACK ##########
    ##########

    import subprocess;

    def process1(): subprocess.call("python process1.py", shell=True);
    def process2(): subprocess.call("python process2.py", shell=True);
    def process3(): subprocess.call("python process3.py", shell=True);
    def process4(): subprocess.call("python process4.py", shell=True);
        
    setTimeout(process1,100);
    setTimeout(process2,100);
    setTimeout(process3,100);
    setTimeout(process4,100);

score -1 · Answer 4 · answered Sep 24 '20 at 12:51

-1

Try to update your py from 32bit to 64bit.

Simply type python in the command line and you will see which your python is. The memory in 32bit python is very low.

answered Sep 24 '20 at 12:51

Tim

2,121
2
20
30

Increasing memory limit in Python?

4 Answers4

Linked