0

I might start out by saying that I am usually a C++ programmer, and I am inexperienced in doing I/O with C. I am writing a simple program that is trying to read a list of words from the user, alphabetize the letters in each word (preserving capital letters), and print them back out in the order received.

I seem to have pinpointed the problem area (emphasized near the bottom of my code), yet even with a microscope on my code, it seems to be some kind of byzantine bug to me. Here is my code:

int comp (const void* left, const void* right) { //for qsort
    return ( *(char*)left - *(char*)right );
}

int main() {
    printf("Please enter a sentence. Press enter to end.\n");
    char str[500]; 
    if (!fgets(str, sizeof(str), stdin)) { //grab line
        printf("die\n");
        return 1;
    }
    str[strcspn(str, "\n")] = 0; //finds newline and kills it
    printf("You entered:\n%s\n", str);

    char abc[sizeof(str)];    //alphabetized array
    int abc_count = 0;        //keeps track of effective size of abc
    char c;   
    char* word = &c;  //will read each word from line
    char* strr = str;  //made to get around a compiler error
    int offset;
    char capitals[25]; //keeps track of capital letters
    int cap = 0;
    int i = 0;

    while (sscanf(strr, " %s%n", word, &offset) == 1) { //%n gives number of characters read by sscanf

        if (i == 0) offset++;  //fixes length of first word, which has no space preceding it
        printf("read a word of size %d\n", offset-1);
        //make sure all letters are lowercase
        cap = 0;
        for (int j = 0; j < (offset-1); j++) {
            if ( isupper(word[j])) {  //if letter is capitalized
                word[j] = tolower(word[j]);     //lowercase it for sorting
                capitals[cap] = word[j]; //remember letter for later
                cap++;
            }
        }
        qsort(word, (offset-1)/sizeof(*word), sizeof(*word), comp);    //Alphabetize. "-1" makes offset not count space in length
        //recapitalize letters
        for (int j = 0; j < (offset-1); j++) {
            for (int k = 0; k < cap; k++) {
                if (word[j] == capitals[k]) {
                    word[j] = toupper(word[j]);  //recapitalize letter
                    capitals[k] = 0;    //capital has been used
                    break;
                }
            }
        }
        //write word to abc                ///// PROBLEM AREA /////
        printf("%s\n", word);
        for (int j = 0; j < (offset-1); j++) {
            if (i == 0) printf("first word: %s\n", word); //debugging
            printf("word[%d] = %c | ", j, word[j]);      //debugging
            abc[abc_count] = word[j];
            printf("abc[%d] = %c\n", abc_count, word[j]);  //debugging
            abc_count++;
        }
        if (i == 0) printf("first word: %s\n", word);
        abc[abc_count] = ' ';
        printf("abc[%d] = space\n", abc_count);
        abc_count++;
        if (i == 0) printf("first word: %s\n", word);
        printf("so far: %s\n\n", abc);
        if (i == 0) offset--; //undo correction for first word
        strr += offset; //stops infinite loop by moving pointer. This line is the reason "strr" exists instead of using "array type" str
        i = 1;
    } //while loop                       ///// PROBLEM AREA /////

    printf("Alphabetized: \n");
    for (i = 0; i < abc_count; i++) {
        printf("%c", abc[i]);  //I write directly from memory because somehow a null char is being added to "abc" after every word entered
    }
    printf("\n");

    return 0;
}

What I have found through extensive use of "printf" debugging is that one of my strings (always the first word entered) is being overwritten, as well as the front of the line to be output. It is very bizarre. Here is some sample output:

Please enter a sentence. Press enter to end.
Chocolate word cAT
You entered:
Chocolate word cAT
read a word of size 9
aCcehloot                //word is sorted as desired. No issues
first word: aCcehloot
word[0] = a | abc[0] = a
first word: aacehloot      //here you can see the strange overwrite behavior
word[1] = a | abc[1] = a
first word: aaaehloot
word[2] = a | abc[2] = a
first word: aaaahloot
word[3] = a | abc[3] = a
first word: aaaaaloot
word[4] = a | abc[4] = a
first word: aaaaaaoot
word[5] = a | abc[5] = a
first word: aaaaaaaot
word[6] = a | abc[6] = a
first word: aaaaaaaat
word[7] = a | abc[7] = a
first word: aaaaaaaaa    //first word is completely replaced by its first alphabetical letter
word[8] = a | abc[8] = a
first word: aaaaaaaaaa?n
abc[9] = space
first word: aaaaaaaaaa n
so far: aaaaaaaaa n

read a word of size 4
dorw
word[0] = d | abc[10] = d
word[1] = o | abc[11] = o
word[2] = r | abc[12] = r
word[3] = w | abc[13] = w
abc[14] = space
so far: orw

read a word of size 3
AcT
word[0] = A | abc[15] = A
word[1] = c | abc[16] = c
word[2] = T | abc[17] = T
abc[18] = space
so far: cT

Alphabetized:
cT  aaaaa dorw AcT
//the final result is shown as: everything but the first letter of the last word, 
//a cut-off remainder of the first overwritten word, the second word, then the third word
  • This might be a good time to [learn how to debug your program](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/). I also recommend tools such as [Valgrind](http://valgrind.org/) or similar to help you find buffer overruns. – Some programmer dude Jun 05 '18 at 05:56
  • 3
    The English (Latin) alphabet normally has 26 capitals — not 25 as you've allowed space for. – Jonathan Leffler Jun 05 '18 at 05:57
  • 1
    Please extract a [mcve]. In particular, eliminate the input part, unless of course that has an influence, in which case other parts should probably be removed. – Ulrich Eckhardt Jun 05 '18 at 05:59
  • Ulrich do you mean the entire bottom section, or just the single line where I enter the input? – Christian Coggan Jun 05 '18 at 06:04
  • Gah. You need to adopt a conventional coding style. All those single line `if` statements are nothing but code smell. – Lundin Jun 05 '18 at 06:31
  • Lundin do you have a recommendation for an alternative? I would be interested in making my code less shoddy-looking. – Christian Coggan Jun 05 '18 at 06:46
  • I get the impression the `if (i == 0) …do something…` lines are a conditional debug printing mechanism. Maybe you should consider the ideas in [C `#define` macro for debug printing](https://stackoverflow.com/questions/1644868). – Jonathan Leffler Jun 05 '18 at 06:49
  • Jonathan Two of them are for an edge case, the rest are for debugging because my code is in draft mode. Thanks for the tip. – Christian Coggan Jun 05 '18 at 07:07
  • Run clang-format on the document and you'll see. One line if statements are pretty intolerable to look at. – Roflcopter4 Jun 05 '18 at 10:24
  • I didn't refer to any particular part. However, rather that giving input code and then telling people what to input, use some hard-coded values that demonstrate the issue without any need for interpretation. Take the same approach with every other bit of code and replace or eliminate it if it isn't necessary. The end result should then satisfy the requirements for an MCVE. – Ulrich Eckhardt Jun 05 '18 at 20:45

1 Answers1

2

This part looks strange:

char c;   
char* word = &c;  //will read each word from line

Here word is a char pointer pointing to a single char (i.e. c).

Yet you do:

sscanf(strr, " %s%n", word, &offset)

which will read multiple chars into to the memory pointed to by word. In other words you write to memory beyond c which is undefined bahavior (UB).

I would expect word to be an array of chars (or point to an array of chars) to avoid out of bounds accesses (i.e. UB).

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • Yeah that line is far from elegant, I admit. I just didn't like the compiler complaining at me with a warning about an uninitialized pointer. Is it possible for that declaration to cause a problem for me when I use word as an array? I would be surprised if that caused a problem. – Christian Coggan Jun 05 '18 at 06:13
  • 1
    @ChristianCoggan Quote: "Is it possible for that declaration to cause a problem for me when I use word as an array?" Yes, it will cause problem as you write beyond the allocated space. You need to fix that first. I haven't read all your code in details but I would expect something like `char word[100]` instead. – Support Ukraine Jun 05 '18 at 06:14
  • 2
    @ChristianCoggan You have space allocated for a single character. You then try to write multiple characters to it, writing past that one allocated character, corrupting your stack. C doesn't do your memory allocation for you. I find your surprise surprising. – Tom Karzes Jun 05 '18 at 06:31
  • Wow that fixed it. I thought that if I had a pointer to a spot in memory, then I could act as though that pointer pointed at an array, even if the memory for the array had not been explicitly allocated. Turns out I'm better at writing functions than declaring variables. Thanks! – Christian Coggan Jun 05 '18 at 06:44
  • @ChristianCoggan Glad to help. Just for precision - quote "I thought if I had a pointer to a spot in memory". Well, you did... but the spot of memory was **only** 1 char large. And you needed more chars. – Support Ukraine Jun 05 '18 at 06:47
  • @ChristianCoggan Quote " I just didn't like the compiler complaining at me with a warning about an uninitialized pointer." and now, you know that you have to take seriously each warning that the compiler give you. It's nearly suicidale to hide a warning by doing something you don't clearly understand. C language isn't forgiving at all, and UB is a pain in the a** to deal with when the program have "worked normally" for a long time. I wish you a good continuation. – Tom's Jun 05 '18 at 07:29