2

This is a programming problem I come across very often and was wondering whether there is a data structure, either in the C++ STL or one I can implement myself which provides both random and sequential access.

An example of why I might need this:

  • Say there are n types of items, (n = 1000000, for example), and there's a fixed number of each type of item (for example, 0 or 10)

  • I store these items into an array, where the array index represents the type of the item, and the value represents how many items of that given type are there

  • Now, I have an algorithm which iterates over all EXISTING items. To obtain these items, it is very wasteful to iterate over the entire array when all the entries are 0, except for i.e. Array[99999] and Array[999999].

Normally, I solve this by using a linked list which saves the indices of all the nonzero array entries. I implement the standard operations in this way:

Insert(int t):

1) If Array[t] == 0, LinkedList.push_back(t);

2) Array[t]++;

Delete(int t):

1) If Array[t] == 1, find and remove t from LinkedList;

2) Array[t]--;

If I want O(1) complexity for the deletion operation, I make the array store containers instead of integers. Each container contains an integer and a pointer to the respective element of the LinkedList, so I don't have to search through the list.

I would love to know whether there is a data structure which formalizes/improves this approach, or whether there's a better way to do this altogether.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
user129186
  • 1,156
  • 2
  • 14
  • 30
  • 3
    I think using an unordered_map (hash map with O(1) average find), instead of a linked list will achieve the running times you desire. – Untitled123 Dec 27 '15 at 15:34
  • Do you need the iteration to be in order? If not, how about `std::unordered_map` from type to count? – Alan Stokes Dec 27 '15 at 15:35
  • std::map has the issue of having slower run times compared to an array and unordered_map combination – Untitled123 Dec 27 '15 at 15:37
  • There is a trade of between memory complexity and runtime complexity. – Jarod42 Dec 27 '15 at 15:39
  • If you are in need of fast iterations through the entire set (as said in bullet-point 3) Try using a map/unordered_map containing the Objects, and then reference them all in a vector (or similiar, fast iterative dynamic array) as Object* . This allows you to quickly access members by both a key, as well as iterate quickly though all members based on index. – Lilith Daemon Dec 27 '15 at 15:41
  • By "sequential", do you mean that elements are ordered from, say, smallest to largest, or that they're ordered in the same order they were inserted? – Emile Cormier Dec 27 '15 at 15:52
  • By sequential I mean I want to iterate over every single one which is nonzero, in no particular order. – user129186 Dec 27 '15 at 15:57
  • Not sure if this is a duplicate, since sparse arrays are not part of the standard library and have to be "emulated". – Emile Cormier Dec 27 '15 at 16:47

1 Answers1

6

Given the following requirements:

  • Random access
  • Fast lookups
  • Fast insertions
  • Fast removals
  • Avoid wasted space

then you probably want something called a sparse array. Sparse arrays are not part of the standard library, so you'll have to emulate your own, using a std::map or std::unordered_map. In a sparse array, only non-zero elements occupy space in the collection.

An ordered_map will have O(1) lookups, insertions, and removals, but does not provide ordered iteration. A map will generally have slower operations, but will provide ordered iteration. I'm oversimplifying things when I say std::map is slower, as it depends on the number of elements and usage patterns (a topic probably already discussed in another question).

If you must absolutely have both O(1) lookups and ordered iteration, then you can combine both a map and ordered_map and keep them in sync. At that point, you'll want to consider using Boost.MultiIndex.

Here's a rough sketch showing how you can implement your own sparse vector class:

class SparseVector
{
public:
    int get(size_t index) const
    {
        auto kv = map_.find(index);
        return (kv == map_.end()) ? 0 : kv->second;
    }

    void put(size_t index, int value)
    {
        if (value == 0)
            map_.erase(index);
        else
            map_.emplace(index, value);
    }

    // etc...

private:
    std::unordered_map<size_t, int> map_;
};

In such a sparse vector class, you can overload operator[] if you wish to allow something like sparseVec[42] = 123.

Linear algebra libraries, such as Eigen or Boost.uBlas, already provide templates for sparse vectors and sparse matrices.

Emile Cormier
  • 28,391
  • 15
  • 94
  • 122