4

I have the following multiset in C++:

template<class T>
class CompareWords {
public:
    bool operator()(T s1, T s2)
    {
        if (s1.length() == s2.length())
        {
            return ( s1 < s2 );
        }
        else return ( s1.length() < s2.length() );
    }
};
typedef multiset<string, CompareWords<string>> mySet;
typedef std::multiset<string,CompareWords<string>>::iterator mySetItr;
mySet mWords;

I want to print each unique element of type std::string in the set once and next to the element I want to print how many time it appears in the list (frequency), as you can see the functor "CompareWord" keeps the set sorted.

A solution is proposed here, but its not what I need, because I am looking for a solution without using (while,for,do while).

I know that I can use this:

//gives a pointer to the first and last range or repeated element "word"
auto p = mWords.equal_range(word);
// compute the distance between the iterators that bound the range AKA frequency 
int count = static_cast<int>(std::distance(p.first, p.second));

but I can't quite come up with a solution without loops?

Community
  • 1
  • 1
InsaneBot
  • 2,422
  • 2
  • 19
  • 31

3 Answers3

2

Unlike the other solutions, this iterates over the list exactly once. This is important, as iterating over a structure like std::multimap is reasonably high overhead (the nodes are distinct allocations).

There are no explicit loops, but the tail-end recursion will be optimized down to a loop, and I call an algorithm that will run a loop.

template<class Iterator, class Clumps, class Compare>
void produce_clumps( Iterator begin, Iterator end, Clumps&& clumps, Compare&& compare) {
  if (begin==end) return; // do nothing for nothing
  typedef decltype(*begin) value_type_ref;
  // We know runs are at least 1 long, so don't bother comparing the first time.
  // Generally, advancing will have a cost similar to comparing.  If comparing is much
  // more expensive than advancing, then this is sub optimal:
  std::size_t count = 1;
  Iterator run_end = std::find_if(
    std::next(begin), end,
    [&]( value_type_ref v ){
      if (!compare(*begin, v)) {
        ++count;
        return false;
      }
      return true;
    }
  );
  // call our clumps callback:
  clumps( begin, run_end, count );
  // tail end recurse:
  return produce_clumps( std::move(run_end), std::move(end), std::forward<Clumps>(clumps), std::forward<Compare>(compare) );
}

The above is a relatively generic algorithm. Here is its use:

int main() {
  typedef std::multiset<std::string> mySet;
  typedef std::multiset<std::string>::iterator mySetItr;

  mySet mWords { "A", "A", "B" };

  produce_clumps( mWords.begin(), mWords.end(),
    []( mySetItr run_start, mySetItr /* run_end -- unused */, std::size_t count )
    {
      std::cout << "Word [" << *run_start << "] occurs " << count << " times\n";
    },
    CompareWords<std::string>{}
  );
}

live example

The iterators must refer to a sorted sequence (with regards to the Comparator), then the clumps will be passed to the 3rd argument together with their length.

Every element in the multiset will be visited exactly once with the above algorithm (as a right-hand side argument to your comparison function). Every start of a clump will be visited (length of clump) additional times as a left-hand side argument (including clumps of length 1). There will be exactly N iterator increments performed, and no more than N+C+1 iterator comparisons (N=number of elements, C=number of clumps).

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
1
#include <iostream>
#include <algorithm>
#include <set>
#include <iterator>
#include <string>

int main()
{
    typedef std::multiset<std::string> mySet;
    typedef std::multiset<std::string>::iterator mySetItr;

    mySet mWords;

    mWords.insert("A");
    mWords.insert("A");
    mWords.insert("B");

    mySetItr it = std::begin(mWords), itend = std::end(mWords);
    std::for_each<mySetItr&>(it, itend, [&mWords, &it] (const std::string& word)
    {
        auto p = mWords.equal_range(word);
        int count = static_cast<int>(std::distance(p.first, p.second));
        std::cout << word << " " << count << std::endl;
        std::advance(it, count - 1);
    });
}

Outputs:

A 2
B 1

Live demo link.

Piotr Skotnicki
  • 46,953
  • 7
  • 118
  • 160
  • i have xCode giving me two errors: (1) 'mWords' in capture list does not name a variable (2)'this' cannot be implicitly captured in this context – InsaneBot Aug 24 '14 at 12:39
  • if `mWord` is a class' member, use `[this, &it]` in capture list – Piotr Skotnicki Aug 24 '14 at 12:42
  • it works, i struggling to understand how lambda capture things, any suggestion for a good resource to learn more ? – InsaneBot Aug 24 '14 at 12:48
  • Start with http://en.wikipedia.org/wiki/Anonymous_function#C.2B.2B_.28since_C.2B.2B11.29 – Piotr Skotnicki Aug 24 '14 at 12:51
  • `std::for_each` is *very* bad. Reference types don't satisfy the iterator requirements (they are not `CopyAssignable`), and the library implementation may actually decide to call a helper function using template argument deduction, in which case the referenceness will be dropped. – T.C. Aug 24 '14 at 13:05
1

Following does the job without explicit loop using recursion:

void print_rec(const mySet& set, mySetItr it)
{
    if (it == set.end()) {
        return;
    }
    const auto& word = *it;
    auto next = std::find_if(it, set.end(),
        [&word](const std::string& s) {
            return s != word;
        });
    std::cout << word << " appears " << std::distance(it, next) << std::endl;
    print_rec(set, next);
}

void print(const mySet& set)
{
    print_rec(set, set.begin());
}

Demo

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • This code can recurse as many times as there're non-duplicate elements in the `multiset`. Isn't that a recipe for stack overflow? – Arun R Aug 25 '14 at 15:02
  • Ignore previous comment. It looks like tail-end recursion avoids stack overflow and is an optimization implemented by the compiler. – Arun R Aug 25 '14 at 15:11