how to use python to built an int arrray and manuplate it as efficient as C?

Question

From http://www.cs.bell-labs.com/cm/cs/pearls/sol01.html:

The C code is like that:

#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];

void set(int i) {        a[i>>SHIFT] |=  (1<<(i & MASK)); }
void clr(int i) {        a[i>>SHIFT] &= ~(1<<(i & MASK)); }
int  test(int i){ return a[i>>SHIFT] &   (1<<(i & MASK)); }

I found ctypes, BitArrays,numpy but I'm not sure whether they could be as efficient as the C codes above.

For example, if I write codes like this:

from ctypes import c_int
a=[c_int(9)]*1024*1024

Would the space used be 1M Bytes, or much more?

Does anyone know some good libraries that can do the same things in Python?

What is your use case for which you think the current implementations are not efficient? — Burhan Khalid, Aug 15 '12 at 18:25
@BurhanKhalid I added more descriptions for the question, would `a=[c_int(9)]*1024*1024` use more than `1M Bytes`? — Hanfei Sun, Aug 15 '12 at 18:29
To clarify: when you say "efficient" what do you mean? Do you want to minimize wall time, minimize CPU time, minimize memory usage, or something different? — Daniel Pryden, Aug 15 '12 at 18:29
@DanielPryden Mainly for memory usage, for example, will `a=[c_int(9)]*1024*1024` use more than `1M Bytes`? — Hanfei Sun, Aug 15 '12 at 18:32
You wouldn't need to use multiple ints anyway because in Python an `int` can be (virtually) any length. — MRAB, Aug 15 '12 at 18:36
@MRAB I want to make an array whose length is `1024*1024`, understand? — Hanfei Sun, Aug 15 '12 at 18:37
@Firegun: Don't stress. I think MRAB's point is that if what you really want is an addressable set of arbitrarily many bits, an array is not required. A lot of us at StackOverflow are used to questions that have an [X-Y Problem](http://meta.stackexchange.com/a/66378/138089), and so it's common for comments on the question to try to draw out what the *real* problem is. That said, if your question really is whether numpy arrays store ints in contiguous memory, the answer is yes. See my updated answer below for more information. — Daniel Pryden, Aug 15 '12 at 19:28

score 3 · Answer 1 · edited May 23 '17 at 10:24

Numpy or ctypes are both good choices. But are you sure your Python code really needs to be as efficient as C, and are you sure this code is a performance hotspot?

The best thing to do is to use the Python profiler to ensure that this code really does need to be as efficient as C. If it truly does, then it will probably be easiest to just keep the code in C and link to it using something like ctypes or SWIG.

Edit: To answer your updated question, a numpy array of size N with element size M will contain N*M bytes of contiguous memory, plus a header and some bytes for views.

Here are a couple of related links:

score 2 · Answer 2 · answered Aug 15 '12 at 18:29

you can also check the built-in array module:

>>> import array
>>> help(array)
Help on built-in module array:

NAME
    array

FILE
    (built-in)

DESCRIPTION
    This module defines an object type which can efficiently represent
    an array of basic values: characters, integers, floating point
    numbers.  Arrays are sequence types and behave very much like lists,
    except that the type of objects stored in them is constrained.  The
    type is specified at object creation time by using a type code, which
    is a single character.  The following type codes are defined:

        Type code   C Type             Minimum size in bytes 
        'b'         signed integer     1 
        'B'         unsigned integer   1 
        'u'         Unicode character  2 (see note) 
        'h'         signed integer     2 
        'H'         unsigned integer   2 
        'i'         signed integer     2 
        'I'         unsigned integer   2 
        'l'         signed integer     4 
        'L'         unsigned integer   4 
        'f'         floating point     4 
        'd'         floating point     8

score 2 · Answer 3 · answered Aug 15 '12 at 18:52

This:

a=[c_int()]

makes a list which contains a reference to a c_int object.

Multiplying the list merely duplicates the references, so:

a = [c_int()] * 1024 * 1024

actually creates a list of 1024 * 1024 references to the same single c_int object.

If you want an array of 1024 * 1024 c_ints, do this:

a = c_int * (1024 * 1024)

how to use python to built an int arrray and manuplate it as efficient as C?

3 Answers3