0

This function runs for a bit, then proc_index variable goes to -1886854513. Is there something wrong with the code?

int parse_words(vector< vector<string> > &temp_word_vec, int num_of_sub_lists)
{
    char word[MAX_WORD_LENGTH+1]; // +1 makes room for a newline character
    int proc_index = 0; //index of word arr for child process "proc_index"
    string word_str;

    cerr << "point1\n";
    while(fscanf (stdin, "%s", &word) != EOF)
    {
        cerr << "point2\n";
        for(int i = 0; i < MAX_WORD_LENGTH; i++)
        {
            word[i] = tolower(word[i]);
            if(word[i] == '\0')
            {
                word_str.push_back('\n');
                word_str.push_back('\0');
                break;
            }
            if(isalpha(word[i]))
            {
               word_str.push_back(word[i]);
            }
        }
        cerr << "point3, proc_index = " << proc_index << ", word is " << word_str << "\n";

        temp_word_vec[proc_index].push_back(word_str);

        ++proc_index;

        if(proc_index == num_of_sub_lists)
            proc_index = 0;
        word_str.clear();
    }

    return 0;
}
Marty
  • 5,926
  • 9
  • 53
  • 91
  • What's the maximum value for `num_of_sub_lists`? Can `num_of_sub_lists` ever be negative? – Xeo Feb 14 '12 at 05:14
  • 1
    Are you sure that MAX_WORD_LEN is really the max? If not, you could be overwriting the next thing on the stack, which is procIndex. – user1118321 Feb 14 '12 at 05:16
  • @user1118321: But regarding that the input stream is unsanitized, `MAX_WORD_LEN` is never enough. To brake the program, just `program < /dev/random`. – Sebastian Mach Feb 14 '12 at 05:38
  • @phresnel: well, on average, that would give you roughly a 256-character line. If you wanted to ensure failure, you'd have to strip out the newlines, something like : `cat /dev/random | sed '/d' | program`. But now I'm just being picky :-) – paxdiablo Feb 14 '12 at 05:51
  • @paxdiablo: I consider this as worth knowing :) – Sebastian Mach Feb 14 '12 at 05:59

2 Answers2

2

It's almost certainly due to corruption, most likely caused by you reading more bytes into word than you've allocated for it.

Easy way to detect, change:

cerr << "point2\n";

to:

cerr << "point2 maxword = " << MAX_WORD_LENGTH <<
    ", strlen = " << strlen (word) << '\n';

As an aside, you never want to do an unbounded *scanf("%s") on data you don't totally control. Use a bound (like "%20s") or, better yet since you're only after character data, use fgets which can prevent buffer overflows.

Or, even better than that, use C++ strings with getline, rather than some weird C/C++ hybrid :-)

Community
  • 1
  • 1
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 2
    And even easier to fix: Change `word` to be a `std::string` and use `std::getline` instead of `fscanf`. – Xeo Feb 14 '12 at 05:23
  • Though you can't make `std::getline` match "a sequence of non-whitespace characters". I guess `istream::operator>>` fits the purpose better. – Sebastian Mach Feb 14 '12 at 05:40
  • @FrederickCraine: seriously, that's the way I work now. If I strike a problem, I spend 15 minutes trying to solve it. If I haven't nutted it out by then, I ask on SO (or one of the sister sites usually since I usually figure out my own programming problems) then return to the problem, periodically checking back here. I find my problems get solved a lot faster when there's a few thousand people thinking about them :-) – paxdiablo Feb 14 '12 at 05:41
  • @paxdiablo True, but I always want to make sure it's not a bad question before I ask it. I've gotten enough questions ridiculed and closed on here that I'm hesitant to ask noob questions. – Marty Feb 14 '12 at 05:44
  • and another one... http://stackoverflow.com/questions/9272276/can-you-specify-what-isnt-a-delimiter-in-stdgetline – Marty Feb 14 '12 at 05:45
  • @FrederickCraine: Have you read all answers already? `std::istream::operator>>` does nearly the same as `%s` – Sebastian Mach Feb 14 '12 at 05:52
2
while(fscanf (stdin, "%s", &word) != EOF)

This line is fishy. Form the preconditions you've described neither you nor fscanf know whether there is enough space in word *. You can fix it the easy way:

std::string word;
while (stdin >> word)

If performance is an issue, you can deactivate synchronization with C streams (but you have to get rid of all C style IO inbetween):

const bool sync_prev = ios_base::sync_with_stdio (false);
...
ios_base::sync_with_stdio (sync_prev);

*: Actually, because you are reading from an unsanitized stream (stdin), every user can, consciously or not, brake your program and possibly strike a security breach into the whole system.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130