1

I have this predicate function which compares two strings alphabetically, the strings being compared are human names so are of unequal length, to get round this the shorter string is padded with white space.

Problem:

Error

I've tracked the bug down to the string padding...which appears to randomly break the string iterator:

ls += std::string(  maxlen - ls.size(), ' '  ) ;
rs += std::string(  maxlen - rs.size(), ' '  ) ;

Here is what the two string iterators look like after successful padding, as you can see they both point to their respective string as they should: healthy string iters

& here are the same string iterators further down the list of names being compared, as you can see riter is now pointing to 'ar5' not "Aaron Tasso" which I'm guessing is the cause of the error: unhealth string iter

I've tried removing the name "Abraham Possinger" from the input but it throws the same error further down the list on another name.

Input:

Aaron Tasso               
Aaron Tier                 
Abbey Wren                 
Abbie Rubloff              
Abby Tomopoulos           
Abdul Veith               
Abe Lamkin                  
Abel Kepley              
Abigail Stocker            
Abraham Possinger  
bool 
alphanum_string_compare( const std::string& s, const std::string& s2 )
#pragma region CODE
{

  // copy strings: ...for padding to equal length..if required?
  std::string ls = s ;
  std::string rs = s2 ;

// string iters
std::string::const_iterator liter = ls.begin() ;
std::string::const_iterator riter = rs.begin() ;

  // find length of longest string
  std::string::size_type maxlen = 0 ;
  maxlen = std::max( ls.size(), rs.size() ) ;  

// pad shorter of the 2 strings by attempting to pad both ;) 
// ..only shorter string will be padded!..as the other string == maxlen
// ..possibly more efficient than finding and padding ONLY the shorter string
ls += std::string(  maxlen - ls.size(), ' '  ) ;
rs += std::string(  maxlen - rs.size(), ' '  ) ;



  // init alphabet order map
  static std::map<char, int> m = alphabet() ;
  //std::map<char, int> m = alphabet();



  while(  liter != ls.end() && riter != rs.end()  ) 
  {
    if ( m[ *liter ] < m[ *riter ]  ) return true ;
    if ( m[ *liter ] > m[ *riter ]  ) return false ;

    // move to next char
    ++liter ;
    ++riter ;
  }




    return false ;

}
#pragma endregion CODE
tuk
  • 367
  • 1
  • 4
  • 10

4 Answers4

3

The problem is that you pad the strings after you assign the iterators.

// string iters
std::string::const_iterator liter = ls.begin() ;
std::string::const_iterator riter = rs.begin() ;

ls += std::string(  maxlen - ls.size(), ' '  ) ; <----------- potentially invalidates iterator
rs += std::string(  maxlen - rs.size(), ' '  ) ; <----------- potentially invalidates iterator

  while(  liter != ls.end() && riter != rs.end()  ) { <--- using invalid iterator
    if ( m[ *liter ] < m[ *riter ]  ) return true ;
    if ( m[ *liter ] > m[ *riter ]  ) return false ;

    // move to next char
    ++liter ;
    ++riter ;
  }

    return false ;    
}

Your padding is unneeded if you check after the loop which has ended and return the correct value of true or false there.

Surt
  • 15,501
  • 3
  • 23
  • 39
2

std::string is a dynamic object, when you modify it it is quite possible that its internal memory buffer will be re-allocated. At this point your "old" iterators point to a memory that was returned to heap (deleted). It's just the same as with most of containers, for example std::vector - you can copy a iterator to an arbitrary element, but once you add anything to the vector, your iterator may be no longer valid. Any Most "modifying" operation invalidate iterators to such objects.

Freddie Chopin
  • 8,440
  • 2
  • 28
  • 58
  • _"Any 'modifying' operation invalidates iterators to such objects."_ Not quite: http://stackoverflow.com/q/6438086/560648 – Lightness Races in Orbit Nov 26 '14 at 16:38
  • @Freddie Chopin Thanks for the quick reply Freddie, the penny dropped when I seen your post, others have since fleshed out the issue. – tuk Nov 26 '14 at 17:33
2

The padding may invalidate the iterator when the underlying storage is reallocated on expansion.

You could fix this by retrieving the iterators after the padding, but the padding is unnecessary.

You just need to check where the iterators ended up - s is less than s2 if its iterator reached the end but the other one didn't.

bool 
alphanum_string_compare( const std::string& s, const std::string& s2 )
{
    static std::map<char, int> m = alphabet();

    std::string::const_iterator left = s.begin();
    std::string::const_iterator right = s2.begin();

    while (left != s.end() && right != s2.end()) 
    {
        if (m[*left] < m[*right])
            return true;
        if (m[*left] > m[*right])
            return false;
        ++left;
        ++right;
    }
    return left == s.end() && right != s2.end();
}
molbdnilo
  • 64,751
  • 3
  • 43
  • 82
  • @ molbdnilo Your code edit works! The issue appears to depend on how many white spaces are added, so short names requiring more padding trigger the 'underlying storage reallocation' which makes sense. It's repeatable with these string args. GOOD: bool b = alphanum_string_compare("Abraham Possinge", "Aaron Tassoooooo"); BAD: bool b = alphanum_string_compare("Abraham Possinge", "Aaron Tassooooo"); – tuk Nov 26 '14 at 17:42
1

I don't think it's necessary to pad with whitespace if you're just going to see which name comes first in alphabetical order. One idea could be: check which character is smallest each time around the loop, if one character is smaller than the other, return that string. Example:

string StrCompare(const string& s1, const string& s2)
{
  string::size_type len = (s1.length() < s2.length() ? s1.length() : s2.length());

  for (auto i = 0; i != len; ++i) {
    if (s1[i] < s2[i])
      return s1;
    else if (s2[i] < s1[i])
      return s2;
    else
      ;// do nothing
  }
}

main()

string str = StrCompare("Aaron Tasso", "Aaron Tier");
cout << str;

Output: Aaron Tasso

Andreas DM
  • 10,685
  • 6
  • 35
  • 62
  • Thanks for the work around. I've been using this code for a while without incident, but have started using new names for the input so was curious what was causing it break on seemingly random names. – tuk Nov 26 '14 at 17:30