My system (3.2.0-52-generic, g++-4.7 (Ubuntu/Linaro 4.7.3-2ubuntu1~12.04) 4.7.3, compiled with -O2 if not specified, CPU: i3-2125)
In my test cases I used 295068 words dictionary (so, there are 100k more words than in yours): http://dl.dropboxusercontent.com/u/4076606/words.txt
From time complexity point of view:
- Worst case your program complexity: O(n*n)=O(n[read data]*n[insert into unordered_set])
- Average case your program complexity: O(n)=O(n[read data]*1[insert into unordered_set])
Practical tips:
- Most simple data structure have less time overhead. Simple array is faster than vector. char array is faster than string. There are plenty of info in the web about it.
My measurements:
Notice: I didn't flush my OS cache & HDD cache. The last one I can't control, but first one can be controlled with:
sync; sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
Also I didn't omit those measurements that included a lot of context-switches and so on. So, there is space to do better measurements.
My code (read from file & insert data; search all the words):
14–16 ms to read from file & insert data to a 2D char array (read & insert) n times
65-75 ms to search with binary search all the words (search n times):
Total=79-91 ms
61-78 ms to read from file & insert data to a unordered_set char array (read & insert) n times
7-9 ms to search by hash n times
Total=68-87 ms
If you search more times than you insert choose hash table (unordered_set) otherwise binary search (with simple array).
Your original code (search & insert):
Compiled with -O2: 157-182 ms
Compiled with -O0 (if you omit -O flag, "-O" level by default is also 0): 223-248 ms
So, compiler options also matters, in this case it means 66 ms speed boost. You didn't specified any of them. So, my best guess is you didn't used it. As I try to answer to your main question.
What you can do most simple, but better with your current code?
- [better usage of high level API] Use "getline(wordListFile, word)" instead of "wordListFile >> word". Also I think "getline" is more readable than the ">>" operator.
Compiled with -O2: 142-170 ms. ~ 12-15 ms speed boost compared with your original code.
Compiled with -O0 (if you omit -O flag, "-O" level by default is also 0): 213-235 ms. ~ 10-13 ms speed boost compared with your original code.
- [better usage of high level API] Avoid rehashing with "dict.reserve(XXXXXX);", @David Rodríguez - dribeas also mentioned about it. If your dictionary is static or if you can guess your dictionary size (for example by file size divided by average word length). First run without "reserve" and outputted bucket_count (cout << "bucket_count = " << dict.bucket_count() << "\n";), in my case it is 299951. Then I added "dict.reserve(299951);".
Compiled with -O2: 99-121-[137] ms. ~ 33-43-[49] ms speed boost compared with your original code.
What you can do more advanced to speed it up?
Implement your own hash function for your specific data input. Use char array instead of STL string. After you did it, only then write code with direct OS I/O. As you noticed (from my measurements also can be seen) that data structure is the bottleneck. If the media is very slow, but CPU is very fast, compress the file uncompress it in your program.
My code is not perfect but still it is better than anything can be seen above: http://pastebin.com/gQdkmqt8 (hash function is from the web, can be also done better)
Could you provide more details about for what system (one or range) do you plan to optimize?
Info of time complexities:
Should be links... But I don't have so much reputation as I'm beginner in stackoverflow.
Is my answer still relevant to anything? Please, add a comment or vote as there is no PM as I see.