7

I would like to ask what is the cheapest data type (in term of memory consumption and cost to hold/process it) to be used as dummy value in python dict (only key of the dict matters to me, values are just placeholder)

For examples :

d1 = {1: None, 2: None, 3: None}
d2 = {1: -1, 2: -1, 3: -1}
d3 = {1: False, 2: False, 3: False}

Here only keys (1, 2, 3) are useful to me, the values are not so they can be any value (just used as a place holder. What I want to know is what dummy data I should use here. For now I use None, but not sure if it is the "cheapest" one.

P.S., I know the best option to store only keys may be to use Set instead of dict (with dummy values). However, the reason that I am doing so is because I want to exchange data between Python and C++ using SWIG. And for now I have figured out how to pass Python dict to C++ as std::map using SWIG, but cannot find anything about how to pass Python Set to C++ as std::set...

Helps / Guidances are very appreciated here!

Yu Wang
  • 73
  • 4
  • 2
    hint: `sys.getsizeof(None)` returns 16. – Jean-François Fabre Sep 24 '18 at 15:13
  • 5
    @Jean-FrançoisFabre But there is only one `None` object, and it is almost certainly present anyway. So the size of it is irrelevant. – Martin Bonner supports Monica Sep 24 '18 at 15:17
  • 1
    @MartinBonner yes that doesn't matter much – Jean-François Fabre Sep 24 '18 at 15:18
  • 4
    Are you asking "which of these data structures take up the most space in the Python process' memory?" or "which of these data structures has the smallest size in-transit when serialized and sent to another process?"? I suspect these questions have different answers. – Kevin Sep 24 '18 at 15:18
  • 1
    @Kevin I don't think the OP is talking about cross-process calling - but calling between C++ and Python within a single process. – Martin Bonner supports Monica Sep 24 '18 at 15:21
  • 1
    Interesting question, but almost certainly premature optimisation – Chris_Rands Sep 24 '18 at 15:30
  • @Kevin, I think for SWIG, the second one (smallest size in-transit) is more relevant. Thanks for clarification. – Yu Wang Sep 24 '18 at 15:46
  • @Martin Bonner, What I did is to use SWIG to create a Python module, which wraps a Mixed-Integer-Programming (MIP) model (formulated by Ilog CPLEX C++ API) insides it, and using Python to handle the data manipulation (cleaning, filtering, ...). Then send the data as parameters to the module to construct the MIP model and solve it (which will be extremely slow if using Python). I guess it is a single process, but not very sure. Anyway, I hope this help explain the situation more. And thanks for your comment. – Yu Wang Sep 24 '18 at 15:56
  • @Chris_Rands, Yes, it may be too early to worry about, but I just want to learn more, and want to do things in the correct way from the beginning. :-). Thanks for your comment. – Yu Wang Sep 24 '18 at 15:59
  • If you've figured out `std::map` passing to SWIG (using std_map.i), you can use a set (using std_set.i). – Mark Tolonen Sep 28 '18 at 05:12

2 Answers2

6

python 3.4 64bit:

>>> import sys
>>> sys.getsizeof(None)
16
>>> sys.getsizeof(False)
24
>>> sys.getsizeof(1)
28
>>> 

So None would appear to be the best choice (I've only listed immutable objects, and disregarded strings and tuples). Note that it doesn't matter much as those objects are usually cached, so the size isn't multiplied by the number of elements of your dictionary (furthermore None is guaranteed to be a singleton)

That said, the cost of the actual object is neglectable compared to the cost of storing a reference to that object for each key/value pair. If your dictionary holds 1000 values, you have 1000 references to store, whatever the size of the value.

Conclusion: it doesn't matter much as long as you're using the same reference everywhere, and it's going to cost much more than a set anyway because of the references being stored as the values of each dictionary entry.

One possible alternative would be to pass the set as json representation (in a list, then) as a pointer of characters to the C++ side, which will parse it using a good json parser. Unless your values are big floating point values (or huge integers), that would save memory all right because the object aspect is eliminated with the serialization.

>>> json.dumps(list(set(range(4,10))))
'[4, 5, 6, 7, 8, 9]'  # hard to beat that in terms of size!
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

You can use a set, but SWIG seems to only support passing Python lists as the set parameter (or use the named template) without writing your own typemap. Example (Windows):

test.i*

%module test

%include <std_set.i>
%template(seti) std::set<int>;

%inline %{

#include <set>
#include <iostream>
void func(std::set<int> a)
{
    for(auto i : a)
        std::cout << i << std::endl;
}

%}

Output:

>>> import set
>>> s = test.seti([1,1,2,2,3,3])  # pass named template
>>> test.func(s)
1
2
3
>>> test.func([1,2,3,3,4,4])  # pass a list that converts to a set
1
2
3
4
>>> test.func({1,1,2,2,3})   # Actual set doesn't work.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: in method 'func', argument 1 of type 'std::set< int,std::less< int >,std::allocator< int > >'
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251