3

My program has the following requirements: If a command line argument is given, interpret it as a file name and read the input from that file. Otherwise, read input from stdin instead. As I am going to need the input later, I want to save it into an array. And because any non-ASCII characters are to be ignored, I decided to process the input character by character. This is my code:

#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define MAX_WORDS 999
#define MAX_WORD_LENGTH 50

typedef struct Data{
    char words[MAX_WORDS][MAX_WORD_LENGTH+1];
    int numwords;
} Data;

Data* data;

int main(int argc, char** argv){
    data= malloc(sizeof data);

    FILE* infile;
    if(argc<=1){//if no arguments, read words from stdin
        infile= stdin;
    }else if(argc==2){//read words from file
        infile= fopen(argv[1], "r");
        if(infile==NULL){
            printf("failed to open word file");
            return 1;
        }
    }

    char c;
    int wordindex= 0;
    int charindex= 0;
    while(1){//read input character by character, only accepting ASCII characters
        c= fgetc(infile);
        if(c=='\n' || c==EOF){
            data->words[wordindex][charindex]= '\0';
            wordindex++;
            charindex= 0;
            if(wordindex>=MAX_WORDS || c==EOF)
                break;
            continue;
        }
        if(!isascii(c))
            continue;

        data->words[wordindex][charindex]= toupper(c);
        charindex++;
        if(charindex>=MAX_WORD_LENGTH){
            wordindex++;
            charindex= 0;
            if(wordindex>=MAX_WORDS)
                break;
        }
    }
    if(argc==2) fclose(infile);

    data->numwords= wordindex-1;


    //check if everything worked as intended
    printf("==================\n%d word(s) read:\n", data->numwords);
    for (int i = 0; i < data->numwords; i++)
        printf("%d %s\n", (int)strlen(data->words[i]), data->words[i]);
}

Everything works fine if I enter the input through stdin, but if I attempt to read the input from a text file, the program segfaults. It seems to work if the text file contains only one line of text, but if there are two or more then it crashes. I'm a beginner in C and I don't see any difference between reading from stdin or a file, so I have no idea why this is happening. Can somebody enlighten me?

alk
  • 69,737
  • 10
  • 105
  • 255
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
  • Unrelated to your problem, but why allocate the single structure dynamically if you're just allocating one? Or generally allocate a number of structured decided at compile-time? Why not use arrays of the structure? If you put the array as a global variable then it won't take up stack-space if that's what you're worried about. – Some programmer dude Jun 04 '16 at 18:13
  • 2
    Also note that [`fgetc`](http://en.cppreference.com/w/c/io/fgetc) doesn't return a `char`. This is actually very important. – Some programmer dude Jun 04 '16 at 18:15
  • @JoachimPileborg I didn't have a particular reason to use malloc there, it's just because I'm a noob :) Anyway, I don't understand why it matters if `fgetc` returns an int, I just checked and the comparison `c==EOF` works just fine as it is. – Aran-Fey Jun 04 '16 at 18:26
  • 2
    The problem with `fgetc` and `EOF` is that `EOF` is `-1`, and like all integer literals `-1` is an `int`, and its value (on a two-complement 32-bit machine) is `0xffffffff`. A `char` is usually a byte, eight bits, and then negative one has the value `0xff`, or as a 32-bit value `0x000000ff`. And since `char` might be `unsigned` (the signedness of `char` is not defined by the standard) you might compare `0x000000ff` to `0xffffffff` which will not yield the correct result. – Some programmer dude Jun 04 '16 at 18:42
  • 1
    @JoachimPileborg That's a great explanation, thank you. I'll go and fix it immediately. – Aran-Fey Jun 04 '16 at 18:45
  • this line: `Data *data;` is not correct, it should be: `Data data;` so it is creating an instance of the array, rather than a pointer to the array. Then this line: `data= malloc(sizeof data);` can be/ needs to be removed – user3629249 Jun 07 '16 at 12:44
  • the line: `if(!isascii(c))` is not the right thing to use. strongly suggest: `if( !isalpha(c) && !isdigit(c) )` – user3629249 Jun 07 '16 at 12:54
  • the posted code has a logic problem, for instance, if the char read is a space or a comma or a seimicolon or .... Then the posted code places that char into the word, but such characters are not part of a word and should actually end that word, step to the next word. Also, the posted code assumes there is only one word per line (but your question does not state that constraint) – user3629249 Jun 07 '16 at 12:56
  • the posted code has a potential memory leak, It is best practice to NOT expect the OS to cleanup the mess left by a program. I.E. for every call to any memory allocation function (malloc, calloc, realloc) there should be an associated call to `free()` – user3629249 Jun 07 '16 at 13:00
  • [What happens when the return value of `getchar` is not stored in an `int`](https://stackoverflow.com/questions/35356322/difference-between-int-and-char-in-getchar-fgetc-and-putchar-fputc) – Antti Haapala -- Слава Україні Nov 05 '17 at 13:28

1 Answers1

2

This

Data* data;
data= malloc(sizeof data);

allocates bytes to suite the size of data, with data being "just" a pointer to Data, not Data itself. A pointer is 4/8 bytes depending whether on a 32/64 bit platform.

Allocating to few memory here leads to writing to invalid memory soon and with this invoking the infamous Undefined Behaviour. From this moment on anything can happen ranging from crash to nothing.

If you want the number of bytes to hold Data you want to allocate like this

data = malloc(sizeof (Data));

of even better like this

data = malloc(sizeof *data);

Also one should test the outcome of relevant system call, also malloc() could fail:

if (NULL == data)
{
  perror("malloc() failed");
  exit(EXIT_FAILURE);
}
alk
  • 69,737
  • 10
  • 105
  • 255
  • You're right, changing the code to `sizeof *data` worked. I'd like to understand why my code didn't crash when reading from stdin though, it would be great if you could explain that. – Aran-Fey Jun 04 '16 at 18:30
  • @rawing: I just added some words on undefined behaviour. – alk Jun 04 '16 at 18:41
  • Oh, just a coincidence then, more or less? Guess I was reading too much into it. – Aran-Fey Jun 04 '16 at 18:43
  • @rawing: Just "bad" luck I'd say. Have a look at Valgrind to test for such issues. – alk Jun 04 '16 at 18:45
  • In C running code isn't necessarily correct code ... @rawing – alk Jun 04 '16 at 18:47