-1

In my Algorithm, I need to keep all the combinations of (3 bytes of) extended ASCII characters. Following is my code But when i run this code, the program gets killed on terminal when the last step occurs(BigVector.pushback).Why is this so and what can be the alternative in my case?

vector<set<vector<int> > > BigVector;
set<vector<int> > SmallSet;


    for(int k=0; k <256; k++)
    {
        for(int j=0; j <256; j++)
        {     

            for(int m=0; m <256; m++)
            { 
                    vector<int> temp;
                temp.push_back(k);
                temp.push_back(j);
                temp.push_back(m);
                SmallSet.insert(temp);
            }
        }


    }

    BigVector.push_back(SmallSet);

P.S: I have to keep the ascii characters like this: { {(a,b,c) ,(a,b,d),...... (z,z,z)} }

Xara
  • 8,748
  • 16
  • 52
  • 82
  • How big is your stack? – johnsyweb Feb 18 '14 at 10:55
  • 3
    Is this for an algorithms class? If so, you should rethink this data structure, because it's terribly inefficient. – Potatoswatter Feb 18 '14 at 10:59
  • @Potatoswatter I have to make a big set, which comprises of subsets. Each subset may contain one or more sets like this (a,b,c) etc. That's why I used this data structure. Can you please give me suggestions what can be the alternatives? – Xara Feb 18 '14 at 11:04
  • @Zara, are subsets of variable length? – Shoe Feb 18 '14 at 11:05
  • @Jefffrey Yes they are of variable length. – Xara Feb 18 '14 at 11:05
  • @Johnsyweb: Why is the stack relevant? – Lightness Races in Orbit Feb 18 '14 at 11:12
  • 1
    Well, an ordered sequence of three bytes can be replaced by an `array` or even a simple `uint32_t`. And instead of generating all the possibilities and then pruning them by the algorithm, you could just loop over all the integers from 1 to 2^24 and store the results that satisfy whatever requirements. – Potatoswatter Feb 18 '14 at 11:24

2 Answers2

2

Please note that 256^3 = 16,777,216. This is huge, especially when you use vector and set!

Because you only need to record 256 = 2^8 information, you can store this in a char ( one byte). You can store each combination in one tuple of three chars. The memory is now 16,777,216 / 1024 / 1024 = 16 MB. On my computer, it finishes in 1 second.

If you accept C++11, I would suggest using std::array, instead of writing a helper struct like Info in my old code.

C++11 code using std::array.

vector<array<char,3>> bs;
.... for loop
    array<char,3> temp;
    temp[0]=k; temp[1]=j; temp[2]=m;
    bs.push_back(temp);

C++98 code using home-made struct.

struct Info{
    char chrs[3];
    Info ( char c1, char c2, char c3):chrs({c1,c2,c3}){}
};

int main() {
    vector<Info> bs; 
    for (int k = 0; k < 256; k++) {
        for (int j = 0; j < 256; j++) {
            for (int m = 0; m < 256; m++) {
                bs.push_back(Info(k,j,m));
            }
        }
    }
    return 0;
}

Ways to use the combinations. (You can write wrapper method for Info).

// Suppose s[256] contains the 256 extended chars.
for( auto b : bs){
    cout<< s[b.chrs[0]] << "  " << s[b.chrs[1]] << "  "<< s[b.chrs[2]] << endl;
}
Peng Zhang
  • 3,475
  • 4
  • 33
  • 41
  • @Zara I added my solution. Please try it and whether it is OK. :-) – Peng Zhang Feb 18 '14 at 11:15
  • I have used the c++11 method and now I am creating vector of set of array. The program is not getting killed. – Xara Feb 18 '14 at 15:50
  • Further on when I am using vector of set of array, it is taking lot of time to compute the results. Any other modification suggestion in data structure which can make it faster? – Xara Feb 18 '14 at 18:04
  • @Zara I am not sure the reason for you to use ``set of array``? Each combination corresponds to an array. Thus, you only needs to iterate through the vector to get each combination. Is there any specific reason(scenario) for you to use "set of array"? – Peng Zhang Feb 18 '14 at 23:20
2

First: your example doesn't correspond with the actual code. You are creating ( { (a,a,a), ..., (z,z,z) } )

As already mentioned you will have 16'777'216 different vectors. Every vector will hold the 3 characters and typically ~20 bytes[1] overhead because of the vector object.

In addition a typical vector implementation will reserve memory for future push_backs.

You can avoid this by specifying the correct size during initialization or using reserve():

vector<int> temp(3);

(capacity() tells you the "real" size of the vector)

push_back makes a copy of the object you are pushing [2], which might be too much memory and therefore crashing your program.

16'777'216 * (3 characters + 20 overhead) * 2 copy = ~736MiB.
(This assumes that the vectors are already initialized with the correct size!)

See [2] for a possible solution to the copying problem.

I do agree with Potatoswatter: your data structure is very inefficient.

[1] What is the overhead cost of an empty vector?
[2] Is std::vector copying the objects with a push_back?

Community
  • 1
  • 1
close2
  • 308
  • 1
  • 7