-2

I have a basic C program that reads some lines from a text file containing hundreds of lines in its working directory. Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <ctype.h>
#include <string.h>
#include <locale.h>
#include <wchar.h>
#include <wctype.h>
#include <unistd.h>

int main(int argc, const char * argv[]) {
    srand((unsigned)time(0));
    char *nameFileName = "MaleNames.txt";
    wchar_t line[100];
    wchar_t **nameLines = malloc(sizeof(wchar_t*) * 2000);
    int numNameLines = 0;
    FILE *nameFile = fopen(nameFileName, "r");
    while (fgetws(line, 100, nameFile) != NULL) {
        nameLines[numNameLines] = malloc(sizeof(wchar_t) * 100);
        wcsncpy(nameLines[numNameLines], line, 100);
        numNameLines++;
    }
    fclose(nameFile);

    wchar_t *name = nameLines[rand() % numNameLines];
    name[wcslen(name) - 1] = '\0';
    wprintf(L"%ls", name);

    int i;
    for (i = 0; i < numNameLines; i++) {
        free(nameLines[i]);
    }
    free(nameLines);
    return 0;
}

It basically reads my text file (defined as a macro, it exists at the working directory) line by line. Rest is irrelevant. It runs perfect and as expected on my Mac (with llvm/Xcode). When I try to compile (nothing fancy, again, gcc main.c) and run it on a Linux server, it either:

  • Exists with error code 2 (meaning no lines are read).
  • Reads only first 3 lines from my file with hundreds of lines.

What causes this indeterministic (and incorrect) behavior? I've tried commenting out the first line (random seed) and compile again, it always exits with return code 2.

What is the relation between the random methods and reading a file, and why I'm getting this behavior?

UPDATE: I've fixed malloc to sizeof(wchar_t) * 100 from sizeof(wchar_t) * 50. It didn't change anything. My lines are about 15 characters at most, and there are much less than 2000 lines (it is guaranteed).

UPDATE 2:

  • I've compiled with -Wall, no issues.
  • I've compiled with -Werror, no issues.
  • I've run valgrind didn't find any leaks too.
  • I've debugged with gdb, it just doesn't enter the while loop (fgetws call returns 0).

UPDATE 3: I'm getting a floating point exception on Linux, as numNameLines is zero.

UPDATE 4: I verify that I have read permissions on MaleNames.txt.

UPDATE 5: I've found that accented, non-English characters (e.g. Â) cause problems while reading lines. fgetws halts on them. I've tried setting locale (both setlocale(LC_ALL, "en.UTF-8"); and setlocale(LC_ALL, "tr.UTF-8"); separately) but didn't work.

Can Poyrazoğlu
  • 33,241
  • 48
  • 191
  • 389
  • 2
    Have you tried debugging it? – Rowland Shaw Jun 30 '15 at 11:52
  • 5
    You're passing 100 to `fgetws`, but only allocating space for up to 49 characters in your `malloc` call. Are you sure none of the lines are > 49 characters ? Also, there are several points in the code where a call can fail and you have absolutely no error checking - this is just asking for trouble and will typically result in a lot more time spent debugging. – Paul R Jun 30 '15 at 11:52
  • 1
    Read about **[undefined behavior](http://en.wikipedia.org/wiki/Undefined_behavior)** and be *very scared* of it. – Basile Starynkevitch Jun 30 '15 at 12:08
  • 1
    Also, compile your program with `-Wall -Werror`, then run it with `valgrind`. Once you've got that to work with no errors, and have had a go with `gdb`, *then* come back. – abligh Jun 30 '15 at 12:10
  • @PaulR see my updated question. – Can Poyrazoğlu Jun 30 '15 at 12:14
  • @abligh see my updated question. – Can Poyrazoğlu Jun 30 '15 at 12:29
  • `rand.c: No such file or directory.` just means you haven't got `glibc` sources installed. Step over it (or out of if if you are already in it). – abligh Jun 30 '15 at 12:30
  • Suggest using `wcsncpy` not `wcscpy` so your code is safe if one of the lines is too long, and checking you do not overrun the array. If you are using `valgrind` it would have noticed you never free the lines. Please post a minimal *compilable* example that demonstrates the problem (plus test files at pastebin or similar). That way we can see what you are on about. It is also a useful exercise in isolating the issue. – abligh Jun 30 '15 at 12:35
  • @abligh did it, nothing changed. valgrind did initially notice that I wasn't freeing them, I've then added to the end of the program (redacted [...] part). I'm working on an sscce. – Can Poyrazoğlu Jun 30 '15 at 12:44
  • @abligh I've updated the code, now it compiles. – Can Poyrazoğlu Jun 30 '15 at 12:53
  • 1
    @CanPoyrazoğlu OK, that runs without an issue here provided the text file exists with that name (else it SEGV's as the return value of `fopen()` is not checked). Can you post (at pastebin.org or similar) the `MaleNames.txt` you are using. – abligh Jun 30 '15 at 13:00
  • @CanPoyrazoğlu re floating point exception, you aren't using any floating point. Does your toolchain actually work? IE does printing 'hello world' work? – abligh Jun 30 '15 at 13:01
  • @abligh Sure, it works. I think this is the reason to FPE: http://stackoverflow.com/questions/1081250/why-is-this-a-floating-point-exception – Can Poyrazoğlu Jun 30 '15 at 13:04
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/81974/discussion-between-can-poyrazoglu-and-abligh). – Can Poyrazoğlu Jun 30 '15 at 13:09
  • `name[wcslen(name) - 1] = '\0';` will do bad things if you have an empty string (i.e. a blank line in the file), – Paul R Jun 30 '15 at 13:50
  • @PaulR I know. The file is guaranteed to be in the correct format. No blank lines, long lines etc. – Can Poyrazoğlu Jun 30 '15 at 13:55
  • You like living dangerously, don't you ? No error checking on `fopen` (or anywhere else for that matter), and many assumptions about the file format. It's easy to get a blank line at the end of the file, BTW. – Paul R Jun 30 '15 at 14:16
  • @PaulR :) if the file came from somewhere else/or if it was a front-facing program, then you're absolutely right. But this is a personal tool to run on a private server that runs with long precompiled text file that I've created myself). – Can Poyrazoğlu Jun 30 '15 at 14:18
  • 2
    @CanPoyrazoğlu: famous last words. ;-) It's up to you of course, but I always code defensively, even if it's throwaway code - it can save a lot of time in the long run, and who knows, maybe you might want to reuse that code later for something else. Anyway, good luck! – Paul R Jun 30 '15 at 16:10

2 Answers2

3

fgetws() is attempting to read up to 100 wide characters. The malloc() call in the loop allocates 50 wide characters.

The wcscpy() call copies all the wide characters read. If more than 50 wide characters have been read (including the terminating nul) then wcscpy() will overrun the allocated buffer. That results in undefined behaviour.

Instead of multiplying by 50 in the loop, multiply by 100. Or, better yet, compute the length of string read and use that.

Independently of the above, your code will also overrun a buffer if the file contains more than 2000 lines. Your loop needs to check for that.

A number of the functions in your code can fail, and will return a value to indicate that. Your code is not checking for any such failures.

Your code running under OS X is happenstance. The behaviour is undefined, which means there is potential to fail on any host system, when built with any compiler. Appearing to run correctly on one system, and failing on another system, is actually a valid set of responses to undefined behaviour.

Peter
  • 35,646
  • 4
  • 32
  • 74
0

Found the solution. It was all about the locale, from the beginning. After experimenting and hours of research, I've stumbled upon this: http://cboard.cprogramming.com/c-programming/142780-arrays-accented-characters.html#post1066035

#include < locale.h >

setlocale(LC_ALL, "");

Setting locale to empty string solved my problem instantly.

Community
  • 1
  • 1
Can Poyrazoğlu
  • 33,241
  • 48
  • 191
  • 389