4

What is the best way of reading the dictionary from text file and storing it in the sorted container. I've got performance issue when inserting the words to the container. So Here is my code.

std::set<std::string> m_words;
std::stringstream ss(textStr);
std::string buf;
while (ss >> buf)
{
    m_words.insert(m_words.end(), buf);
}

The dictionary file is English dictionary file with 130000 lines. Is there anything else to do to optimize the performance.

Gevorg Adamyan
  • 111
  • 1
  • 7

2 Answers2

4

Instead of adding sorted values to a std::set, you could push the strings to a std::vector or std::deque (Note: Do not use the std::deque of msvc (2013) - is is not better than a std::list).

A std::set should have a good performance finding the proper insertion point, if it is respecting the hint 'at end'. However, std::set is allocating nodes for each element individually.

Having a sequence of data, you can perform a binary search (std::lower_bound) on that data.

See also:

Note: Having C++11 you might consider vector::reserve(huge_amount) and vector::shrink_to fit()

Community
  • 1
  • 1
3

My guess is to use vector, because the contiguous storage is good for performance, reserve an expected amount of entries beforehand, take advantage of move semantics and sort the vector at the end of the file only:

std::vector<std::string> m_words;
m_words.reserve(10000);

std::string buf;
while (ss >> buf)
{
  m_words.push_back(std::move(buf));
  buf.clear();
}

std::sort(m_words.begin(), m_words.end());

(You can also check the file size in a preprocessing phase to get a better hint instead of a fixed one)

Because of moving, the buf must be reset to an empty state.

Unless otherwise specified, all standard library functions that accept rvalue reference parameters (such as std::vector::push_back) are guaranteed to leave the moved-from argument in valid but unspecified state. That is, only the functions without preconditions, such as the assignment operator, may be used on the object after it was moved into a standard library container (cppreference)

erenon
  • 18,838
  • 2
  • 61
  • 93
  • After `std::move(buf)` you should call `buf.clear()` to put the buffer to an initial state! –  May 26 '15 at 17:15