8

I am curious to know how memory management differs between Bytearray and list in Python.

I have found a few questions like Difference between bytearray and list but not exactly answering my question.

My question precisely ...

from array import array
>>> x = array("B", (1,2,3,4))
>>> x.__sizeof__()
36
>>> y = bytearray((1,2,3,4))
>>> y.__sizeof__()
32
>>> z = [1,2,3,4]
>>> z.__sizeof__()
36

As we can see there is a difference in sizes between list/array.array (36 bytes for 4 elements) and a byte array (32 bytes for 4 elements). Can someone explain to me why is this? It makes sense for byte array that it is occupying 32 bytes of memory for 4 elements ( 4 * 8 == 32 ), but how can this be interpreted for list and array.array?

# Lets take the case of bytearray ( which makes more sense to me at least :p)
for i in y:
        print(i, ": ", id(i))

1 :  499962320
2 :  499962336 #diff is 16 units
3 :  499962352 #diff is 16 units
4 :  499962368 #diff is 16 units

Why does the difference between two contiguous elements differ by 16 units here, when each element occupies only 8 bytes. Does that mean each memory address pointer points to a nibble?

Also what is the criteria for memory allocation for an integer? I read that Python will assign more memory based on the value of the integer (correct me if I am wrong) like the larger the number the more memory.

Eg:

>>> y = 10
>>> y.__sizeof__()
14
>>> y = 1000000
>>> y.__sizeof__()
16
>>> y = 10000000000000
>>> y.__sizeof__()
18

what is the criteria that Python allocates memory?

And why Python is occupying so much more memory while C only occupies 8 bytes (mine is a 64 bit machine)? when they are perfectly under the range of integer (2 ** 64) ?

Metadata :

Python version : '3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)]'

Machine arch : 64-bit

P.S : Kindly guide me to a good article where Python memory management is explained better. I had spent almost an hour to figure these things out and ended up asking this Question in SO. :(

Community
  • 1
  • 1
  • 2
    Good question, upvoted. Hey, you're lucky: on my Linux Xubuntu 64-bit machine CPython 3.4.3 `y.__sizeof__()` gives me `28` for `y=10`, the same for `y=1M`, `32` for `y=10000000000000` – Pynchia Oct 23 '15 at 22:02
  • Hi @Pynchia, Mine is a 32 bit python though my machine is a 64 bit. I am not sure, but that might be the reason. Lets wait for some one to clarify. – Sravan K Ghantasala Oct 23 '15 at 22:44

1 Answers1

3

I'm not claiming this is complete answer, but there are some hints to understanding this.

bytearray is a sequence of bytes and list is a sequence of object references. So [1,2,3] actually holds memory pointers to those integers which are stored in memory elsewhere. To calculate total memory consumption of a list structure, we can do this (I'm using sys.getsizeof everywhere further, it's calling __sizeof__ plus GC overhead)

>>> x = [1,2,3]
>>> sum(map(getsizeof, x)) + getsizeof(x)
172

Result may be different on different machines.

Also, look at this:

>> getsizeof([])
64

That's because lists are mutable. To be fast, this structure allocates some memory range to store references to objects (plus some storage for meta, such as length of the list). When you append items, next memory cells are filled with references to those items. When there are no room to store new items, new, larger range is allocated, existed data copied there and old one released. This called dynamic arrays.

You can observe this behaviour, by running this code.

import sys 
data=[]
n=15
for k in range(n):
    a = len(data)
    b = sys.getsizeof(data)
    print('Length: {0:3d}; Size in bytes: {1:4d}'.format(a, b))
    data.append(None)

My results:

Length:   0; Size in bytes:   64 
Length:   1; Size in bytes:   96
Length:   2; Size in bytes:   96 
Length:   3; Size in bytes:   96
Length:   4; Size in bytes:   96 
Length:   5; Size in bytes:  128
Length:   6; Size in bytes:  128 
Length:   7; Size in bytes:  128
Length:   8; Size in bytes:  128 
Length:   9; Size in bytes:  192
Length:  10; Size in bytes:  192 
Length:  11; Size in bytes:  192
Length:  12; Size in bytes:  192 
Length:  13; Size in bytes:  192
Length:  14; Size in bytes:  192

We can see that there are 64 bytes was used to store 8 memory addresses (64-bit each).

Almost the same goes with bytearray() (change second line to data = bytearray() and append 1 in the last one).

Length:   0; Size in bytes:   56
Length:   1; Size in bytes:   58
Length:   2; Size in bytes:   61
Length:   3; Size in bytes:   61
Length:   4; Size in bytes:   63
Length:   5; Size in bytes:   63
Length:   6; Size in bytes:   65
Length:   7; Size in bytes:   65
Length:   8; Size in bytes:   68
Length:   9; Size in bytes:   68
Length:  10; Size in bytes:   68
Length:  11; Size in bytes:   74
Length:  12; Size in bytes:   74
Length:  13; Size in bytes:   74
Length:  14; Size in bytes:   74

Difference is that memory now used to hold actual byte values, not pointers.

Hope that helps you to investigate further.

anti1869
  • 1,219
  • 1
  • 10
  • 18
  • Hi @anti1869, Thanks for your comment. Its very exhaustive and useful. But I am having following questions in your comment. I am not able to add all the info here and hence adding another comment beneath. Thanks – Sravan K Ghantasala Oct 31 '15 at 06:33
  • Its understood for the list as per your explanation, but why size of the byte array started with 56. and why is it stable after reaching 74? And also would be glad if you can give more info why the initial size if 64 and 56. Thanks – Sravan K Ghantasala Oct 31 '15 at 06:39
  • 1
    Check out that data structures source code. There you will see internal container structure and what does it allocate memory for upon initialization. Also there is growth algorithm visible very clearly. https://github.com/python/cpython/blob/master/Objects/listobject.c – anti1869 Nov 01 '15 at 07:56