1

in other question I asked how to compile a vector with huge amount of data because I wanted a vector with a dictionary of 107776 entries, and I couldn't compile it.

I solved it thanks to this answer's code:

char const * const dict[] = {"aaron",...};

But now the problem is that when I attempt to access one entry...

cout<<dict[431104]<<endl;

...the program freezes and Windows wants to close it.

Why does it happen? How can I solve it?

Edit: sorry, it was my fault. As tbroberg and Seth Carnegie noticed in this answer, the mistake was that I thought that sizeof(dict) was the length of the array (instead of sizeof(dict)/sizeof(*dict)). Therefore, 431104 was far out of the bounds of the array (its length is 107776).

Community
  • 1
  • 1
Oriol
  • 274,082
  • 63
  • 437
  • 513
  • 2
    You need to do what the comment in that other question suggested, put it in a file and parse it. You can't store that much stuff on the stack. – Seth Carnegie Jul 19 '12 at 18:49
  • 1
    @up true. By the way i would recommend you to learn how to use debugger. It's really usefull tool :) – Blood Jul 19 '12 at 18:50
  • Is dict[] local or global variable? – PiotrNycz Jul 19 '12 at 18:54
  • @SethCarnegie I would expect this to put the array of pointers on the stack with the strings in the data segment, (not that putting it in a file is a bad idea.) – tbroberg Jul 19 '12 at 18:55
  • 1
    @tbroberg that is correct, the string literals will be in the data segment, but that doesn't matter; 107776 `char*`s would be around megabyte if each pointer is 8 bytes, and that's about the size of a normal stack. – Seth Carnegie Jul 19 '12 at 18:58
  • How many strings do you have in your dictionary? Is it >= 431105? Does Windows give an error message when it closes? If you look at dict in the debugger, how many elements does it have? Can you inspect element 431104? – tbroberg Jul 19 '12 at 18:59
  • @SethCarnegie Ah, and 400k+ pointers is *very* likely to lead to stack overflow. So the pointers need to be allocated on the heap with new, regardless of where the array comes from - static data or file. The Windows stack size is so vast compared to embedded environments that I had come to think of it as limitless. ;^) – tbroberg Jul 19 '12 at 19:02
  • @tbroberg What is the heap? So, if it's better, in order to allocate `dict` in heap, must I use `new char const * const dict[]={...}` instead of `char const * const dict[]={...}` and that's all, or should I do something else? – Oriol Jul 20 '12 at 19:34
  • Yes, new to allocate stuff on the heap, delete to free. – tbroberg Jul 21 '12 at 21:41

2 Answers2

3

You are allocating 107776 char*s on the stack, which might be enough to cause a stack overflow on your computer. You can try allocating the char*s on the heap and use an initialiser list:

const char** dict = new const char*[107776] {"aaron",...};

// ... use dict

delete[] dict;

That should fix the problem (if the problem is stack size, which I think it is).

Also, I just noticed that your index, 431104, is far out of the bounds of the array, which is of the size 107776 (I misread it as 1 million before). Are you sure your problems haven't just been out-of-bounds indices?

Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • 1
    Ups! It seems you are right and 431104 is far out of the bounds of the array. I wanted to get the length of dict (I knew it is 107776 because I calculated it with JavaScript), but dict.size() didn't work, so I did sizeof(dict). Thanks to your answer I remembered that sizeof says the byte-size of the array (which isn't what I wanted). If I replace sizeof(dict) by 107776 it seems it works! – Oriol Jul 19 '12 at 19:38
  • @Oriol you can calculate the size of an array in elements by doing `sizeof(dict) / sizeof(*dict)`. Beware that that number is one past the end of the array if you use it as an index though. – Seth Carnegie Jul 19 '12 at 20:04
0

Try putting static before const:

static const char *const dict [] = { "a...", ...

Now it is allocated in initialized data segment. Depending on abilities of your platform and compiler/linker it might just work.

The operating system will usually take care of caching/swapping of this data, so if the data are really immutable, it is the preferred method.

fork0
  • 3,401
  • 23
  • 23