0

Could you please help me with the memory?

I got below MemoryError.

File "pandas_libs\algos_common_helper.pxi", line 361, in pandas._libs.algos.ensure_int64 MemoryError

Then, I output all the memory of all variables using the below codes:

def sizeof_fmt(num, suffix='B'):
    ''' by Fred Cirera,  https://stackoverflow.com/a/1094933/1870254, modified'''
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0
    return "%.1f %s%s" % (num, 'Yi', suffix)

print('Memory size of each Varaible:')
for name, size in sorted(((name, sys.getsizeof(value)) for name, value in locals().items()),
                         key= lambda x: -x[1])[:10]:
    print("{:>30}: {:>8}".format(name, sizeof_fmt(size)))

Memory size of each Varaible:
                      df_baker: 572.6 MiB
                       df_hall: 37.5 MiB
                 df_WSGT_baker: 12.1 KiB
                  df_B12_baker: 12.1 KiB
                  df_WSGT_hall:  7.7 KiB
                   df_B12_hall:  7.7 KiB
                      __file__:  178.0 B
               __annotations__:  136.0 B
                         MyWho:   72.0 B
                    sizeof_fmt:   72.0 B

The largest memory is only 570MB for the Pandas dataframe df_baker. And I have 5GB memory. so why I have a Memory Error? Thanks for your help. I appreciate it.

roudan
  • 3,082
  • 5
  • 31
  • 72
  • I agree with the answer below, and it would be helpful to see your code to debug this error rather than the memory dump of variables. You could be doing something that loads things to memory, but not in a variable you created directly. – cmxu Jan 08 '20 at 16:52

3 Answers3

2

I don't think you measure what you think you measure.

To quote the docs for sys.getsizeof():

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

(Emph. mine)

So anything that is not referenced directly from locals(), everything that functions allocate internally, is not shown here.

There are other tools to look at Python heap. Understanding memory consumpltion when the reference graph is not a tree is not easy, and I bet there are lots of multiple links to the same objects on the heap.

Anyway, the ground truth is the RSS size of your Python process (or its equivalent in Windows). This is the amount actually allocated, including everything intermediate, malloc'd by C code (which is plentiful in pandas / numpy), etc.

9000
  • 39,899
  • 9
  • 66
  • 104
  • Thanks. how to RSS size of the Python process as you said? Thanks – roudan Jan 08 '20 at 18:42
  • In your terminal under Linux or macOS, run `top`, locate the process you're running (likely `python` + something), look at the `RSS` column. Or run a GUI utility that shows running tasks, do the same in it. Sorting by RSS helps easily find the heaviest processes. RSS is the amount of RAM currently used by the process ("resident size"), as opposed to VSS, which includes virtual memory not necessarily occupying any physical RAM at a particular moment. (IDK about Windows; Task Manager shows numbers in a way I don't always can make sense of, should be helpful, too, just needs reading some docs.) – 9000 Jan 08 '20 at 18:55
  • Thank you 9000 so much! – roudan Jan 08 '20 at 19:03
1

Pandas gives you the memory usage you are after, including memory used by python objects in your columns, commonly strings. As an example

> df = pd.DataFrame({'a': ['a' * 100] * 100})
> sys.getsizeof(df)
15852
> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 1 columns):
a    100 non-null object
dtypes: object(1)
memory usage: 15.5 KB

The call to info gets you the total memory; you need the memory_usage= parameter since the default arguments give you the shallow memory usage not accounting for the strings.

mdurant
  • 27,272
  • 5
  • 45
  • 74
1

To display RSS memory in MegaBytes at any point in a program this function could be used after import psutil, os

def usage():
    process = psutil.Process(os.getpid())
    return f'{process.memory_info()[0] / float(2 ** 20):,.1f}'  + ' MB'