7

numpy has an implementation of the unique algorithm that returns :

  1. the sorted unique elements of a numpy array (i.e. with no duplicates)

In addition, numpy.unique() can also return :

  1. the indices of the input array that give the unique values
  2. the indices of the unique array that reconstruct the input array

The C++ standard library also implements a unique algorithm (here), that somehow eliminates consecutive duplicates. Combined with std::vector<>::erase and std::sort(), this can return the sorted unique elements of the vector (output 1 of numpy.unique()).

My question is : is there any algorithm in the stl or elsewhere that can also return outputs 2 and 3 of numpy.unique(). If not, is there a way to efficiently implement it ?

Johann Bzh
  • 834
  • 3
  • 10
  • 25
  • 1
    Thanks for sharing this question. Really interesting. Just wanted to share numpy's python implementation source here, as a potential reference (even if it's not C++): https://github.com/numpy/numpy/blob/v1.22.0/numpy/lib/arraysetops.py#L138-L317 (especially lines 351-361 of the code). – Kolyan1 Jan 26 '22 at 18:24
  • 3
    AFAIK, no, the STL does not directly provide such an algorithm. However, you can write this yourself using the STL quite easily. Are you interested in such a solution? – Jérôme Richard Jan 26 '22 at 18:30

3 Answers3

2

A std::map combined with a std::vector gives you all the information you want.

#include <map>
#include <vector>

template <typename Container>
auto unique_map( const Container & xs )
-> std::map <typename Container::value_type, std::vector <std::size_t> >
{
  decltype(unique_map(xs)) S;

  std::size_t n = 0;
  for (const auto & x : xs)
  {
    S[ x ].push_back( n++ );
  }

  return S;
}

Now you can convert any container to a sorted std::map where:

  • each key is a unique element from the original container xs
  • each value is a std::vector listing indices into xs

If you wish to reconstruct xs from the mapping, you can do so easily enough:

#include <numeric>

template <typename UniqueMap>
auto rebuild_from_unique_map( const UniqueMap & um )
-> std::vector <typename UniqueMap::key_type>  // <std::size_t>
{
  decltype(rebuild_from_unique_map(um)) xs;
  if (um.size())
  {
    xs.resize( std::accumulate( um.begin(), um.end(), (std::size_t)0, 
        []( auto n, auto p ) { return n + p.second.size(); } ) );
    for (const auto & m : um)
      for (const auto & n : m.second)
        xs[n] = m.first;                       // index++
  }
  return xs;
}

If you really, really want that list of indices instead (return_inverse) you can get it by changing the commented lines as indicated and adding a std::size_t index = 0; before the nested loop.

Here’s a toy to play with it. Just concatenate the three code snippets in this post and compile. You’ll need Boost for the irange.

#include <algorithm>
#include <boost/range/irange.hpp>
#include <cmath>
#include <iomanip>
#include <iostream>
#include <string>

int main( int argc, char ** argv )
{
  std::vector <int> xs( argc-1, 0 );
  std::transform( argv+1, argv+argc, xs.begin(), []( auto s ) { return std::stoi( s ); } );

  auto width = std::setw( 3 + (int)std::log10( *std::max_element( xs.begin(), xs.end() ) ) );

  std::cout << xs.size() << " integers";
  std::cout << "\nvalue:  "; for (auto x : xs)                         std::cout << " " << width << x;
  std::cout << "\nindex: ["; for (auto n : boost::irange( xs.size() )) std::cout << " " << width << n;
  std::cout << " ]\n";

  auto ys = unique_map( xs );

  std::cout << "\n" << ys.size() << " unique integers";
  std::cout << "\nsorted:"; for (auto y : ys) std::cout << " " << width << y.first;

  std::cout << "\n\n";
  for (const auto & y : ys)
  {
    std::cout << width << y.first << " appears " << y.second.size() << " times at [";
    for (const auto & n : y.second) std::cout << " " << n;
    std::cout << " ]\n";
  }
  
  std::cout << "\nrebuilt:"; for (auto x : rebuild_from_unique_map( ys )) std::cout << " " << width << x;
  std::cout << "\n";
}

Compile with something like:

  • cl /EHsc /W4 /Ox /std:c++17 /I C:\boost\include a.cpp
  • clang++ -Wall -Wextra -pedantic-errors -O3 -std=c++17 a.cpp
  • etc

The toy is not tolerant of bad input: make sure that you give it at least one argument and that all arguments are integers. You can change it to work over any arbitrary std::string if you wish: just adjust the first three statements in main() to fill xs with strings and compute width correctly.

Here’s me playing with it on my Windows box:

a 2 -1 2 4 -1 2 3 7 -1 2 0 5

12 integers
value:     2  -1   2   4  -1   2   3   7  -1   2   0   5
index: [   0   1   2   3   4   5   6   7   8   9  10  11 ]

7 unique integers
sorted:  -1   0   2   3   4   5   7

 -1 appears 3 times at [ 1 4 8 ]
  0 appears 1 times at [ 10 ]
  2 appears 4 times at [ 0 2 5 9 ]
  3 appears 1 times at [ 6 ]
  4 appears 1 times at [ 3 ]
  5 appears 1 times at [ 11 ]
  7 appears 1 times at [ 7 ]

rebuilt:   2  -1   2   4  -1   2   3   7  -1   2   0   5

a 1 1 1

3 integers
value:     1   1   1
index: [   0   1   2 ]

1 unique integers
sorted:   1

  1 appears 3 times at [ 0 1 2 ]

rebuilt:   1   1   1

a 3 1 2

3 integers
value:     3   1   2
index: [   0   1   2 ]

3 unique integers
sorted:   1   2   3

  1 appears 1 times at [ 1 ]
  2 appears 1 times at [ 2 ]
  3 appears 1 times at [ 0 ]

rebuilt:   3   1   2

Good question!

Dúthomhas
  • 8,200
  • 2
  • 17
  • 39
2

Here is a version that just uses one or two temporary vectors. Overall complexity is O(n log(n)) for n input values.

#include <algorithm>
// using std::stable_sort, std::unique, std::unique_copy
#include <iterator>
// using std::back_inserter
#include <vector>
// using std::vector


/** Helper to implement argsort-like behavior */
template<class T>
struct SortPair
{
  T value;
  std::size_t index;

  SortPair(const T& value, std::size_t index)
    : value(value), index(index)
  {}
  SortPair(T&& value, std::size_t index)
    : value(std::move(value)), index(index)
  {}
  bool operator<(const SortPair& o) const
  { return value < o.value; }

  bool operator<(const T& o) const
  { return value < o; }

  friend bool operator<(const T& left, const SortPair& right)
  { return left < right.value; }

  bool operator==(const SortPair& o) const
  { return value == o.value; }

  friend void swap(SortPair& left, SortPair& right)
  {
      using std::swap;
      swap(left.value, right.value);
      swap(left.index, right.index);
  }
};
/**
 * Implements numpy.unique
 *
 * \tparam T scalar value type
 * \tparam Iterator input iterator for type T
 * \param first, last range of values
 * \param index if not null, returns the first indices of each unique value in
 *    the input range
 * \param inverse if not null, returns a mapping to reconstruct the input range
 *    from the output array. first[i] == returnvalue[inverse[i]]
 * \param count if not null, returns the number of times each value appears
 * \return sorted unique values from the input range
 */
template<class T, class Iterator>
std::vector<T> unique(Iterator first, Iterator last,
                      std::vector<std::size_t>* index=nullptr,
                      std::vector<std::size_t>* inverse=nullptr,
                      std::vector<std::size_t>* count=nullptr)
{
  std::vector<T> uvals;
  if(! (index || inverse || count)) { // simple case
    uvals.assign(first, last);
    using t_iter = typename std::vector<T>::iterator;
    const t_iter begin = uvals.begin(), end = uvals.end();
    std::sort(begin, end);
    uvals.erase(std::unique(begin, end), end);
    return uvals;
  }
  if(first == last) { // early out. Helps with inverse computation
    for(std::vector<std::size_t>* arg: {index, inverse, count})
      if(arg)
        arg->clear();
    return uvals;
  }
  using sort_pair = SortPair<T>;
  using sort_pair_iter = typename std::vector<sort_pair>::iterator;
  std::vector<sort_pair> sorted;
  for(std::size_t i = 0; first != last; ++i, ++first)
    sorted.emplace_back(*first, i);
  const sort_pair_iter sorted_begin = sorted.begin(), sorted_end = sorted.end();
  // stable_sort to keep first unique index
  std::stable_sort(sorted_begin, sorted_end);
  /*
   * Compute the unique values. If we still need the sorted original values,
   * this must be a copy. Otherwise we just reuse the sorted vector.
   * Note that the equality operator in SortPair only uses the value, not index
   */
  std::vector<sort_pair> upairs_copy;
  const std::vector<sort_pair>* upairs;
  if(inverse || count) {
    std::unique_copy(sorted_begin, sorted_end, std::back_inserter(upairs_copy));
    upairs = &upairs_copy;
  }
  else {
    sorted.erase(std::unique(sorted_begin, sorted_end), sorted_end);
    upairs = &sorted;
  }
  uvals.reserve(upairs->size());
  for(const sort_pair& i: *upairs)
    uvals.push_back(i.value);
  if(index) {
    index->clear();
    index->reserve(upairs->size());
    for(const sort_pair& i: *upairs)
      index->push_back(i.index);
  }
  if(count) {
    count->clear();
    count->reserve(uvals.size());
  }
  if(inverse) {
    inverse->assign(sorted.size(), 0);
    // Since inverse->resize assigns index 0, we can skip the 0-th outer loop
    sort_pair_iter sorted_i =
      std::upper_bound(sorted_begin, sorted_end, uvals.front());
    if(count)
      count->push_back(std::distance(sorted_begin, sorted_i));
    for(std::size_t i = 1; i < uvals.size(); ++i) {
      const T& uval = uvals[i];
      const sort_pair_iter range_start = sorted_i;
      // we know there is at least one value
      do
        (*inverse)[sorted_i->index] = i;
      while(++sorted_i != sorted_end && sorted_i->value == uval);
      if(count)
        count->push_back(std::distance(range_start, sorted_i));
    }
  }
  if(count && ! inverse) {
    sort_pair_iter range_start = sorted_begin;
    for(const T& uval: uvals) {
      sort_pair_iter range_end =
        std::find_if(std::next(range_start), sorted_end,
                     [&uval](const sort_pair& i) -> bool {
                       return i.value != uval;
                     });
      count->push_back(std::distance(range_start, range_end));
      range_start = range_end;
    }
    /*
     * We could use equal_range or a custom version based on an
     * exponential search to reduce the number of comparisons.
     * The reason we don't use equal_range is because it has worse
     * performance in the worst case (uvals.size() == sorted.size()).
     * We could make a heuristic and dispatch between both implementations
     */
  }
  return uvals;
}

The trick is to implement something resembling np.argsort. Everything else follows naturally. The inverse mapping is a bit tricky but not too hard when you do a pen-and-paper version of it first.

unordered_map

We could also use unordered_map. That reduces the complexity to something like O(n) + O(m log(m)) for n entries with m unique values with sorting and O(n) without sorting the unique values.

But that involves a ton of temporary allocations if the number of unique values not much smaller than n. This issue can be resolved by switching to a hash map implementation such as robin_hood that uses a flat hash map as long the key-value pair is at most 6 size_t large, by default.

However, one should not underestimate the sheer performance of a hash map, even a mediocre one such as std::unordered_map. Here is a simple version that works well in practice.


#ifdef USE_ROBIN_HOOD
# include <robin_hood.h>
#else
# include <unordered_map>
#endif

template<class T, class Iterator>
std::vector<T>
unordered_unique(Iterator first, Iterator last,
                 std::vector<std::size_t>* index=nullptr,
                 std::vector<std::size_t>* inverse=nullptr,
                 std::vector<std::size_t>* count=nullptr)
{
#ifdef USE_ROBIN_HOOD
  using index_map = robin_hood::unordered_map<T, std::size_t>;
#else
  using index_map = std::unordered_map<T, std::size_t>;
#endif
  using map_iter = typename index_map::iterator;
  using map_value = typename index_map::value_type;
  for(std::vector<std::size_t>* arg: {index, inverse, count})
    if(arg)
      arg->clear();
  std::vector<T> uvals;
  index_map map;
  std::size_t cur_idx = 0;
  for(Iterator i = first; i != last; ++cur_idx, ++i) {
    const std::pair<map_iter, bool> inserted =
      map.emplace(*i, uvals.size());
    map_value& ival = *inserted.first;
    if(inserted.second) {
      uvals.push_back(ival.first);
      if(index)
        index->push_back(cur_idx);
      if(count)
        count->push_back(1);
    }
    else if(count)
      (*count)[ival.second] += 1;
    if(inverse)
      inverse->push_back(ival.second);
  }
  return uvals;
}

I tested this with N = 1 million random int entries modulo M. M=N (~600,000 unique values), the Robin-Hood version beats the sorting version. With M=N/10 (~100,000 unique values), even the std version beats the sorting version.

Homer512
  • 9,144
  • 2
  • 8
  • 25
  • This solution works for me, thank you. In my application, the number of different items is quite high (between 1/10 and 1/5 of the total number of entries), so like Jerôme Richard said, your solution is thus probably better for me that the one based on hash maps proposed by Dúthomhas. I have performances very similar to the ones I have with numpy – Johann Bzh Feb 15 '22 at 08:10
  • I also want to add that the items of the container in my application are coordinates (2D or 3D vectors typically). To sort them in the algorithm you've proposed, I just use std::lexicographical_compare as comparison operator, this is very efficient. – Johann Bzh Feb 15 '22 at 08:13
2

In addition to the two current great posted answers, there are some important point to take into account in order to write an efficient implementation.

First of all, using a hash-map can be very efficient if the number of different items is small so that the hash-map can fit in cache. Classical sorting algorithm (eg. introsort/mergesort/heapsort) tends to be significantly slower because of the log n factor in the complexity. However, when the number of different items is big, a sort is generally much faster since the unpredictable random-like hash-map access pattern in RAM can be very expensive: each access can be as slow as the RAM latency (typically 40~100 ns).

Additionally, sort implementations can be vectorized using SIMD instructions (see here for AVX-512 and there for SVE) and parallelized using multithreading. This is especially true in this case since np.unique is generally applied on numbers. C++17 provides parallel algorithms (PSTL) including parallel sorts. Numpy sorting algorithms have recently been improved to use AVX-512 resulting in an order of magnitude faster execution. Alternatively, an O(n) radix sort can be used to efficiently sort small array with short-sized keys (eg. 4-bytes).

Moreover, not all hash-map implementations are equivalent in term of performance. The STL std::unordered_map tends to be pretty slow. This is due to restrictions in the C++ standard (about the invalidation of iterators) that (almost) force the STL not to use an open addressing strategy. Instead, STL implementations generally use linked lists to solve hash conflicts and this cause slow allocation as described by @Homer512. The thing the fastest existing hash-map implementations mainly use open addressing. For example, hopscotch hashing provide good performance with some interesting guarantees (eg. good performance when the load factor is close to 90~100%). Tessil has provided a very interesting benchmark about several hash-map implementations and has written very fast implementations (generally much faster than the one of the mainstream STL implementations).

std::map is usually implemented using a red-black tree. It is written in a pretty efficient way in mainstream STL implementations. However, the allocator used by default is clearly not optimized for this use-case. Indeed, the number of nodes to be inserted is bounded. Thus, one can use a monotonic allocator to speed up the computation. An optimized implementation can even use few simple big arrays with no additional allocation.

Finally, note that np.unique provides the index of the first unique value and not all of them. This enable further optimisations. For example, no std::vector is needed for the std::map-based implementation of @Dúthomhas resulting in a smaller memory footprint and probably higher performance.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59