1

I have a trouble, I create a implementation from radix sort algorithm, but I think that I can use less memory, And I can! but... I do this erasing the element of the vector after used. The problem: 3 minutes of execution vs 17 secs. How erase faster the element? or... How use the memory better.

sort.hpp

#include <iostream>

#include <vector>
#include <algorithm>

unsigned digits_counter(long long unsigned x);

void radix( std::vector<unsigned> &vec )
{
  unsigned size = vec.size();
  if(size == 0);
  else {
    unsigned int max = *max_element(vec.begin(), vec.end());
    unsigned int digits = digits_counter( max );

    // FOR EVERY 10 POWER...
    for (unsigned i = 0; i < digits; i++) {
      std::vector < std::vector <unsigned>  > base(10, std::vector <unsigned> ());

#ifdef ERASE
      // GET EVERY NUMBER IN THE VECTOR AND
      for (unsigned j = 0; j < size; j++) {
    unsigned int digit = vec[0];

    // GET THE DIGIT FROM POSITION "i" OF THE NUMBER vec[j]
    for (unsigned k = 0; k < i; k++)
      digit /= 10;
    digit %= 10;

    // AND PUSH NUMBER IN HIS BASE BUCKET
    base[ digit ].push_back( vec[0] );
    vec.erase(vec.begin());
      }

#else
      // GET EVERY NUMBER IN THE VECTOR AND
      for (unsigned j = 0; j < size; j++) {
    unsigned int digit = vec[j];

    // GET THE DIGIT FROM POSITION "i" OF THE NUMBER vec[j]
    for (unsigned k = 0; k < i; k++)
      digit /= 10;
    digit %= 10;

    // AND PUSH NUMBER IN HIS BASE BUCKET
    base[ digit ].push_back( vec[j] );
      }
      vec.erase(vec.begin(), vec.end()); 
#endif

      for (unsigned j = 0; j < 10; j++)
    for (unsigned k = 0; k < base[j].size(); k++)
      vec.push_back( base[j][k] );
    }
  }
}


void fancy_sort( std::vector <unsigned> &v ) {
  if( v.size() <= 1 )
    return;
  if( v.size() == 2 ) {
    if (v.front() >= v.back())
      std::swap(v.front(), v.back());
    return;
  }
  radix(v);
}

sort.cpp

#include <vector>

#include "sort.hpp"

using namespace std;

int main(void)
{
  vector <unsigned> vec;

  vec.resize(rand()%10000);

  for (unsigned j = 0; j < 10000; j++) {
    for (unsigned int i = 0; i < vec.size(); i++)
      vec[i] = rand() % 100;
    fancy_sort(vec);
  }


  return 0;
}

I'm just learning... it's my 2nd chapter of Deitel C++. So... If someone have a more complex solution... I can learn how use it, the difficulty doesn't matter.

Results Without erase:

g++ -O3 -Wall sort.cpp && time ./a.out
./a.out  2.93s user 0.00s system 98% cpu 2.964 total

With erase:

g++ -D ERASE -O3 -Wall sort.cpp && time ./a.out
./a.out  134.64s user 0.06s system 99% cpu 2:15.20 total
Moises Rojo
  • 371
  • 2
  • 14
  • What optimizations did you use when you built your program? If it is a "debug" or unoptimized build, then the results are meaningless. You also failed to mention the number of numbers you're sorting, and also you didn't post an [mcve]. – PaulMcKenzie Mar 15 '17 at 20:52
  • You can try to create a fixed size std:vector and not do erase/alloc for each digit. The overhead of allocating and releasing memory is probably hurting your runtime. – revelationnow Mar 15 '17 at 20:59
  • @PaulMcKenzie This edit, is almost complete, but you can get a better idea. – Moises Rojo Mar 15 '17 at 21:46

1 Answers1

1

std::vector is not made to have parts removed from it. That's a fact you have to live with. Your trade between speed and memory usage is a classical problem. With vectors, any removal (except from its end) is expensive and is a waste. This is because every time you remove elements, the program either has to reallocate the array internally, or has to shift all elements to fill the voids. That's an ultimate limitation that you can never overcome if you keep using vectors.

vectors for your problem: fast but lots of memory usage.

The other (probably) optimal extreme (memory-wise) is std::list, which has absolutely no problem with removing anything anywhere. On the other hand, accessing elements is only possible by iterating over the whole list to the element, because it's basically a doubly-linked list, and you can't access an element by its number.

lists for your problem: best memory usage, but slow because accessing list elements are slow.

Finally, the middle grounds is std::deque. They offer a middle grounds between vectors and lists. They're not contiguous in memory, but items can be found by their numbers. Removing elements from the middle doesn't necessarily cause the same destruction in vectors. Read this to learn more about them.

deques for your problem: middle grounds, depending on the problem, probably fast for both memory and access.

If memory is your most important concern, then definitely go with lists. That's the fastest for that. If you want the most general solution, go with deque. Also don't neglect the possibility of copying the whole array to another container, sort it, and then copy it back. Depending on the size of the array, that might be helpful.

Community
  • 1
  • 1
The Quantum Physicist
  • 24,987
  • 19
  • 103
  • 189
  • Or.. I can sort from end to begin. and this is more fast, right? You say that eliminate the end is not so expensive. – Moises Rojo Mar 15 '17 at 21:15
  • 1
    @MoisesRojo If that works for your algorithm, then yes. But again, you can *only* remove elements from the end. Otherwise you're destroying performance. – The Quantum Physicist Mar 15 '17 at 21:17