2

I have to loop over a subset of elements in a large array where each element point to another one (problem coming from the detection of connected component in a large graph).

My algo is going as follows: 1. consider 1st element 2. consider next element as the one pointed by the previous element. 3. loop until no new element is discover 4. consider next element not already consider in 1-3, get back to 1. Note that the number of elements to consider is much smaller than the total number of elements.

For what I see now, I can either:

//create a map of all element, init all values to 0, set to 1 when consider
map<int,int> is_set; // is_set.size() will be equal to N

or

//create a (too) large array (total size), init to 0 the elements to consider
int* is_set = (int*)malloc(total_size * sizeof(int)); // is_set length will be total_size>>N

I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory?

josephthomas
  • 3,256
  • 15
  • 20
Arno
  • 331
  • 2
  • 4
  • 8
  • 1
    Not so much a solution, could you could do `int* is_set = new int[total_size];` instead of using malloc. – josephthomas May 03 '12 at 16:44
  • @josephthomas does it changes anything since it's just an allocation of block memory? – Arno May 03 '12 at 16:45
  • 1
    In your case, no, but your question is tagged as C++, and that is more of the `C++ style`. There is a difference between malloc and new though. – josephthomas May 03 '12 at 16:52
  • @josephthomas: the differences between `new` and `malloc` don't really matter in this case, since the OP is allocating an array of POD. – Eric Melski May 03 '12 at 22:33
  • 1
    I know, but a typical suggestion is to keep with "the coding standard specified by the language". – josephthomas May 03 '12 at 22:35
  • [edit: oops, just noticed the age of this, but the questions remain:] Why are you using `int` for yes/no values instead of `bool`? What are all those ugly casts for? This is tagged C++ but quickly seems to revert to basic C. – underscore_d Nov 30 '15 at 08:26

3 Answers3

9

When in doubt, measure the performance of both alternatives. That's the only way to know for sure which approach will be fastest for your application.

That said, a one-time large malloc is generally not terribly expensive. Also, although the map is O(log N), the big-O conceals a relatively large constant factor, at least for the std::map implementation, in my experience. I would not be surprised to find that the array approach is faster in this case, but again the only way to know for sure is to measure.

Keep in mind too that although the map does not have a large up-front memory allocation, it has many small allocations over the lifetime of the object (every time you insert a new element, you get another allocation, and every time you remove an element, you get another free). If you have very many of these, that can fragment your heap, which may negatively impact performance depending on what else your application might be doing at the same time.

Eric Melski
  • 16,432
  • 3
  • 38
  • 52
  • the allocation of the map is not that an issue since the elements are order and I can so initialize iteratively from the end of the map in amortized constant time. It's more the complexity of accessing the keys that is bothering me. – Arno May 03 '12 at 16:56
  • 1
    @Arno: accessing an array element is O(1), which is vastly better than the O(log n) time to access the map element. Again, your best bet is to measure. – Eric Melski May 03 '12 at 17:07
4

If indexed search suits your needs (like provided by regular C-style arrays), probably std::map is not the right class for you. Instead, consider using std::vector if you need dynamic run-time allocation or std::array if your collection is fixed-sized and you just need the fastest bounds-safe alternative to a C-style pointer.

You can find more information on this previous post.

Community
  • 1
  • 1
Gerardo Lima
  • 6,467
  • 3
  • 31
  • 47
  • yeah, why overcomplicate things? if you need a numeric key with no gaps, just use an `std::array` if size known at runtime or a `vector` if not. your key is built in. using an associative container for this is reinventing the wheel, but making it square. – underscore_d Nov 30 '15 at 08:26
1

I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory?

Each entry in the map is dynamically allocated, so if the dynamic allocation is an issue it will be a bigger issue in the map. As of the data structure, you can use a bitmap rather than a plain array of int's. That will reduce the size of the array by a factor of 32 in architectures with 32bit ints, the extra cost of mapping the index into the array will in most cases be much smaller than the cost of the extra memory, as the structure is more compact and can fit in fewer cache lines.

There are other things to consider, as whether the density of elements in the set is small or not. If there are very few entries (i.e. the graph is sparse) then either option could be fine. As a final option you can manually implement the map by using a vector of pair<int,int> and short them, then use binary search. That will reduce the number of allocations, incur some extra cost in sorting and provide a more compact O(log N) solution than a map. Still, I would try to go for the bitmask.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • I'm not that familiar with this approach of bitmap. I'll looking for infos and advises about that, sounds like an interesting idea. Thx – Arno May 03 '12 at 16:53
  • @Arno: There are types that implement bitmaps already (including `std::bitset` (fixed size) and the infamous `std::vector`), basically you map each value to just one bit in your internal representation. To set element N, you would update an unsigned integer at position (N/32), and set the (N%32) bit. This provides for a compact representation of the data. – David Rodríguez - dribeas May 03 '12 at 19:48