4

If I am correct, doing something like:

char *line;

Then I must allocate some memory and assign it to line, is that right? If I am right, the question is the following:

In a line like

while (fscanf(fp,"%[^\n]", line) == 1) { ... } 

without assigning any memory to line I am still getting the correct lines and the correct strlen counts on such lines.

So, does fscanf assign that memory for me and it also places the '\0'? I saw no mention about these 2 things on the spec for fscanf.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
SJPRO
  • 303
  • 2
  • 8
  • See also [Why do you specify the size when using malloc in C?](http://stackoverflow.com/q/1240970/100754) – Sinan Ünür Apr 07 '15 at 04:24
  • Note: `%[^\n]` will fail if it doesn't match at least one character, which means that it will fail on completely blank lines. Perhaps that was your intent. – rici Apr 07 '15 at 04:36
  • @rici my actual code is %[0-9 ' ' \n] .. but too much hazard for writing it in this question. the textfiles I will run contain 6 characters of 2 digits separated by spaces or a dash on each line. I will then atoi all the 2 digits (because I am not sure if %d works using txt files, not programming for years, lol). But thanks ! – SJPRO Apr 07 '15 at 04:58
  • %d will work fine. Why wouldn't it? fscanf is for text files. – rici Apr 07 '15 at 05:16
  • @rici Idk, %d sounds like for a binary integer, 06 in a textfile are 2 byte chars. That;s my stupid intuition, lol. – SJPRO Apr 07 '15 at 05:25
  • 1
    printf with `%d` doesn't print "in binary". scanf and printf are analogues. – rici Apr 07 '15 at 05:38
  • @rici that's right, that's why I thought fscanf wouldnt read in binary. or parse it. Thanks :) – SJPRO Apr 07 '15 at 05:52

5 Answers5

7

The POSIX scanf() family of functions will allocate memory if you use the m (a on some older pre-POSIX versions) format modifier. Note: when fscanf allocates, it expects a char ** pointer. (see man scanf) E.g.:

while(fscanf(fp,"%m[^\n]", &line) == 1) { ... }

I would also suggest consuming the newline with "%m[^\n]%*c". I agree with the other answers that suggest using line-oriented input instead of fscanf. (e.g. getline -- see: Basile's answer)

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • interesting. Does it also "reallocates" ? If I consume and discard the newline, is %*c, although discardered, cosidered as a match ? (for the return value pourpose) EDIT: Also, you put an & before line, is it required when using %m or just a typo?. – SJPRO Apr 07 '15 at 04:41
  • I have not read whether or how the `scanf` family reallocates. When you **read and discard** it is **not** added as a match. – David C. Rankin Apr 07 '15 at 04:42
  • Sorry about the question on the ampersand,did not catch the char ** – SJPRO Apr 07 '15 at 05:03
5

See the C FAQ:

Q: I just tried the code

char *p;
strcpy(p, "abc");

and it worked. How? Why didn't it crash?

A: You got lucky, I guess. The memory randomly pointed to by the uninitialized pointer p happened to be writable by you, and apparently was not already in use for anything vital. See also question 11.35.

And, here is a longer explanation, and another longer explanation.

Community
  • 1
  • 1
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
4

To read entire lines with a recent C library on POSIX systems, you should use getline(3). It allocates (and reallocates) the buffer holding the line as needed. See the example on the man page.

If you have a non-POSIX system without getline you might use fgets(3) but then you have to take the pain to allocate the line itself, test that you did not read a full newline terminated line, and repeat. Then you need to pre-allocate some line buffer (using e.g. malloc) before calling fgets (and you might realloc it if a line does not fit and call fgets again). Something like:

 //// UNTESTED CODE
 size_t linsiz=64;
 char* linbuf= malloc(linsiz);
 if (!linbuf) { perror("malloc"); exit(EXIT_FAILURE); };
 memset(linbuf, 0, sizeof(linbuf));
 bool gotentireline= false;
 char* linptr= linbuf;
 do {
   if (!fgets(linptr, linbuf+linsiz-linptr-1, stdin))
     break;
   if (strchr(linptr, '\n')) 
     gotentireline= true;
   else {
     size_t newlinsiz= 3*linsiz/2+16;
     char* newlinbuf= malloc(newlinsiz);
     int oldlen= strlen(linbuf);
     if (!newlinbuf) { perror("malloc"); exit(EXIT_FAILURE); };
     memset (newlinbuf, 0, newlinsiz); // might be not needed
     strncpy(newlinbuf, linbuf, linsiz);
     free (linbuf);
     linbuf= newlinbuf;
     linsiz= newlinsiz;
     linptr= newlinbuf+oldlen;
    );
  } while(!gotentireline);
  /// here, use the linbuf, and free it later

A general rule would be to always initialize pointers (e.g. declare char *line=NULL; in your case), and always test for failure of malloc, calloc, realloc). Also, compile with all warnings and debug info (gcc -Wall -Wextra -g). It could have give a useful warning to you.

BTW, I love to clear every dynamically allocated memory, even when it is not very useful (because the behavior is then more deterministic, making programs easier to debug).

Some systems also have valgrind to help detecting memory leaks, buffer overflows, etc..

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Oh, great! I will check it out !! – SJPRO Apr 07 '15 at 04:41
  • Not sure, the truth is I installed the worst compiler ever (probably). Turbo C for windows 7. I did not write any code since college (like 7/8 years). So I am going blind, trial/error. I just want to write some code to process a file that will contain some lottery numbers per line (6 each line) from past years to calculate statistics and how much a given number appears on its own and also related to the other numbers in the same game. – SJPRO Apr 07 '15 at 04:51
  • i.e 01 02 03 04 05 06\n 01 02 03 04 05 06\n I should end up having: 01 appeared 2 times (like the rest) 01 appeared with 02 2 times .... xx appeared with xx n times. I will use an array for that I guess. – SJPRO Apr 07 '15 at 04:51
  • @SJPRO: installing Linux on your laptop would be very useful. Indeed, TurboC is probably not the best compiler you could get. If you are forced to use Windows, try perhaps [MINGW](http://mingw.org/) – Basile Starynkevitch Apr 07 '15 at 04:52
  • Yes, I had a linux VM for some time, and I am pretty sure I can write the code I need using bash scripting, because linux commands are cool and powerful. Tried to do this with batch but I almost ended crying, so moved to what I remember from C. – SJPRO Apr 07 '15 at 05:01
  • The point is that a recent [GCC](http://gcc.gnu.org/) with *all warnings enabled* is very useful. – Basile Starynkevitch Apr 07 '15 at 05:05
3

line is uninitialized and doen't point to any valid memory location so what you see is undefined behavior.

You need to allocate memory for your pointer before writing something to it.

PS: If you are trying to read the whole line then fgets() is a better option.Note that fgets() comes with a newline character .

Gopi
  • 19,784
  • 4
  • 24
  • 36
  • Thanks. What about the '\0' ? Does fscanf assigns it? It seems like it but I dont know. – SJPRO Apr 07 '15 at 04:28
  • @SJPRO You should have memory allocated to hold the null character also.When a valid string is read you will get a null terminated string – Gopi Apr 07 '15 at 04:30
2

Nope. You're just getting lucky undefined behavior. It's completely possible that an hour from now, the program will segfault instead of run as expected or that a different compiler or even the same compiler on a different machine will produce a program that behaves differently. You can't expect things that aren't specified to happen, and if you do, you can't expect it to be any form of reliable.

Corbin
  • 33,060
  • 6
  • 68
  • 78
  • Oh, thanks. What ammount of memory should I allocate ? I am reading textfiles and could it be any random file. Sorry, I did not write any code for years now. – SJPRO Apr 07 '15 at 04:31
  • @SJPRO If you know the maximum size a line will ever be, you can use that (although at that point you should allocate it as an automatic thing instead of malloc). If that's not known up front, things get a lot more complicated. You'll either need to use `realloc` + `fgets`, a compiler extension or a library. – Corbin Apr 07 '15 at 04:33
  • But `fscanf` **will** allocate memory for `line` if you tell it to. – David C. Rankin Apr 07 '15 at 04:37
  • @DavidC.Rankin If you're using compiler extensions. – Corbin Apr 07 '15 at 04:42
  • Yes, they have been in common use for a while, but there are still some compilers around that do not provide this capability. Good catch. – David C. Rankin Apr 07 '15 at 04:46
  • Ohh, alright. Because I would go crazy with this arcane Turbo C IDE, lol ! – SJPRO Apr 07 '15 at 04:54