0

I am writting a C++ program which checks if some words exist in Catalan, so I have a vector with the Catalan dictionary:

const vector<string> dict={"aaron","ababol","abac","abaca","abacallanada","abacallanava","abacas","abacial", ... ,"zum-zum","zur","zuric","zwitterio"};

The problem is that the dictionary has 107776 entries, so when I attempt to compile the file:

g++ -Wall file.cc -std=c++0x -o file.exe

it does nothing during a while and then Windows says that it isn't responding and closes it.

How can I compile it? Is there a better way of storing this type of data (arrays, ...)?

Oriol
  • 274,082
  • 63
  • 437
  • 513
  • 15
    Put them in a file and parse it at run-time? – Mysticial Jul 17 '12 at 22:26
  • I would get your program to read in a file containing the dictionary. Makes the dictionary easier to maintain as well. – Ed Heal Jul 17 '12 at 22:27
  • 4
    Compiling huge amounts of data into your binary has a lot of downsides, but few advantages. Consider not doing this! – Oliver Charlesworth Jul 17 '12 at 22:28
  • Hmmm, I'm currently compiling a gigantic array that results in a 5MB object file, and it takes about three or four seconds for me. – R. Martinho Fernandes Jul 17 '12 at 22:32
  • Are you compiling it through an IDE? Perhaps might work better on the command line (Cygwin and run in background perhaps)? – Ed Heal Jul 17 '12 at 22:35
  • For a way to link data directly into your program see: http://stackoverflow.com/a/4865249/168175 you'll need to lay it out in a way that makes sense though. – Flexo Jul 17 '12 at 22:38

4 Answers4

4

You may well have more luck with old-school built-in arrays:

char const * const dict[] = {"aaron",...};

This will generate a load of string literals and an array of pointers to them, which shouldn't be too much of a strain for the compiler. This will also use no more memory than necessary, with little or no work at runtime.

Alternatively, std::array<char const *> should be just as efficient, with more of a C++ look and feel.

Your version also has to generate an enormous amount of code to build an initializer_list from those, construct a string from each, and add each string to the vector. It will also require more than twice as much memory as each string literal needs to be copied into memory allocated at runtime, and then all those pointers need to be stored in another run-time allocated array.

The disadvantage is that you may end up constructing a temporary string each time you read from the dictionary. If that's a concern, then an array of std::string might be a reasonable compromise.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
3

Store it in external file, and load on demand. This is the best solution, otherwise I suppose you should split your vector into multiple vectors and maybe put them into separate cpp files.

marcinj
  • 48,511
  • 9
  • 79
  • 100
  • Ok, but why? If Firefox (in JavaScript, which is very slow) can handle that amount of data easily, why the compiler can't? Wouldn't it be slower if the dictionary is read on demand instead of having compiled it before? – Oriol Jul 17 '12 at 22:40
  • @Oriol As you have already discovered, the compiler isn't able to handle it. So it's not really a matter of performance. – Mysticial Jul 17 '12 at 22:47
0

Store the dictionary in a text file, one word per line. Then add this code to your program:

{ 
  std::string inputFileName;
  std::ifstream inputFile(inputFileName);
  std::string word;
  while( std::getline(inputFile, word) )
    dict.push_back(word);
}
Shahbaz
  • 46,337
  • 19
  • 116
  • 182
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • 3
    I'd prefer constructing the dictionary from a pair of `std::istream_iterator`s, which takes care of the `move`s for you. The code in this answer does string copies left and right. – Mooing Duck Jul 17 '12 at 22:40
  • And it makes you look smarter. – chris Jul 17 '12 at 22:41
  • I'd do it that way too, if there was a guarantee that the input had no embedded spaces. I'm not familiar with Catalan vocabulary, so I used `getline`. – Robᵩ Jul 18 '12 at 02:05
0

Would it be possible to load only a single set of the dictionary from file using methods in other answers, i.e. load only "a" words from file a.dic. Or do you need to have access to the entire dictionary at once?

Drise
  • 4,310
  • 5
  • 41
  • 66