0

I have read strings with spaces in them using the following scanf() statement.

scanf("%[^\n]", &stringVariableName);

What is the meaning of the control string [^\n]?

Is is okay way to read strings with white space like this?

Deepu
  • 7,592
  • 4
  • 25
  • 47
ShadyBears
  • 3,955
  • 13
  • 44
  • 66
  • I recommend the use of fgets. – BLUEPIXY Apr 24 '13 at 20:23
  • Don't forget to read the newline you're leaving in the stream. Also, you probably want to specify a maximum length to avoid buffer overflows. – effeffe Apr 24 '13 at 20:24
  • Bluepixy is right. scanf is a big security risk. – ncmathsadist Apr 24 '13 at 20:25
  • 2
    "For me, scanf is a pain." -- It's not just you. – Fred Larson Apr 24 '13 at 20:29
  • @ncmathsadist No. scanf isn't a security risk at all. The risk is the programmer who uses it incorrectly, completely ignoring the manual with some over-inflated sense of ego... If anyone who read this page were to take the time to patiently read the manual, they'd know *the other two virtually unknown problems with this code* and how to avoid those problems. – autistic Apr 24 '13 at 21:30
  • @undefinedbehaviour `scanf` *is* a security risk, because it is *possible* to use it unsafely. Blaming the victims of poor API design is a great way to ensure nothing will ever change. – zwol Apr 24 '13 at 22:36
  • @Zack `fgets` is also possible to use unsafely. In fact, that describes C. C isn't a security risk. It's the people who write poor C code who are security risks. Whether or not the design for scanf is *poor* is debatable; That's a matter of preference. The significant fact is that there's a manual that tells you all of the things you can expect to be safe and portable. If *you* don't read manuals before using standard functions (at least for the first time), then *you* are the security risk. Similarly, if you don't follow the road rules, you are a risk on the road. Don't blame the road... – autistic Apr 25 '13 at 08:02
  • @undefinedbehaviour People took that attitude for the past, oh, thirty years, and so there's thirty years of evidence demonstrating that *it doesn't work*. The APIs need to be designed so that the path of least resistance is secure, or else the *API* is wrong. And yeah, I say C is a security risk. It's *possible* to use safely but it's so hard that you oughta look with great suspicion on anyone who proposes to use it for new code. – zwol Apr 25 '13 at 13:37

5 Answers5

5

This mean "read anything until you find a '\n'"

This is OK, but would be better to do this "read anything until you find a '\n', or read more characters than my buffer support"

char stringVariableName[256] = {}
if (scanf("%255[^\n]", stringVariableName) == 1)
    ...

Edit: removed & from the argument, and check the result of scanf.

Jonatan Goebel
  • 1,107
  • 9
  • 14
  • 2
    +1 for correctly using the size of the variable in the format specification. – Jonathan Leffler Apr 24 '13 at 20:25
  • @JonathanLeffler It's unfortunate that the field width seems to be the only focus. There are two other problems with this code. I wonder if anyone else will spot them... – autistic Apr 24 '13 at 21:02
  • @undefinedbehaviour: Address of array; and testing return value from `scanf()`? Yes, you have to go with what's there... – Jonathan Leffler Apr 24 '13 at 21:41
  • I suppose this is the right point to trot out my [three reasons why `*scanf` should never be used at all](http://stackoverflow.com/questions/15664664/scanf-regex-c/15664816#15664816). But it's been my experience that listing *all* the problems with someone's code when they just want help with one of them is a great way to make them stop paying attention. (If they actually asked for a code review, that's different.) – zwol Apr 24 '13 at 22:38
  • scanf is very useful and powerful, and it work like all others things in C, if you know what you are doing, than go for it. If you don´t understand exactly how the function works, than go read the documentation before using it. @undefined-behaviour thanks for the notes, going to fix my answer. – Jonatan Goebel Apr 25 '13 at 12:28
4

The format specifier "%[^\n]" instructs scanf() to read up to but not including the newline character. From the linked reference page:

    matches a non-empty sequence of character from set of characters. 

If the first character of the set is ^, then all characters not
in the set are matched. If the set begins with ] or ^] then the ]
character is also included into the set. 

If the string is on a single line, fgets() is an alternative but the newline must be removed as fgets() writes it to the output buffer. fgets() also forces the programmer to specify the maximum number of characters that can be read into the buffer, making it less likely for a buffer overrun to occur:

char buffer[1024];
if (fgets(buffer, 1024, stdin))
{
    /* Remove newline. */
    char* nl = strrchr(buffer, '\n');
    if (nl) *nl = '\0';
}

It is possible to specify the maximum number of characters to read via scanf():

scanf("%1023[^\n]", buffer);

but it is impossible to forget to do it for fgets() as the compiler will complain. Though, of course, the programmer could specify the wrong size but at least they are forced to consider it.

hmjd
  • 120,187
  • 20
  • 207
  • 252
2

Technically, this can't be well defined.

Matches a nonempty sequence of characters from a set of expected characters (the scanset).

If no l length modifier is present, the corresponding argument shall be a pointer to the initial element of a character array large enough to accept the sequence and a terminating null character, which will be added automatically.

Supposing the declaration of stringVariableName looks like char stringVariableName[x];, then &stringVariableName is a char (*)[x];, not a char *. The type is wrong. The behaviour is undefined. It might work by coincidence, but anything that relies on coincidence doesn't work by my definition.

The only way to form a char * using &stringVariableName is if stringVariableName is a char! This implies that the character array is only large enough to accept a terminating null character. In the event where the user enters one or more characters before pressing enter, scanf would be writing beyond the end of the character array and invoking undefined behaviour. In the event where the user merely presses enter, the %[...] directive will fail and not even a '\0' will be written to your character array.


Now, with that all said and done, I'll assume you meant this: scanf("%[^\n]", stringVariableName); (note the omitted ampersand)

You really should be checking the return value!!

A %[ directive causes scanf to retrieve a sequence of characters consisting of those specified between the [ square brackets ]. A ^ at the beginning of the set indicates that the desired set contains all characters except for those between the brackets. Hence, %[^\n] tells scanf to read as many non-'\n' characters as it can, and store them into the array pointed to by the corresponding char *.

The '\n' will be left unread. This could cause problems. An empty field will result in a match failure. In this situation, it's possible that no data will be copied into your array (not even a terminating '\0' character). For this reason (and others), you really need to check the return value!

Which manual contains information about the return values of scanf? The scanf manual.

autistic
  • 1
  • 3
  • 35
  • 80
1

Other people have explained what %[^\n] means.

This is not an okay way to read strings. It is just as dangerous as the notoriously unsafe gets, and for the same reason: it has no idea how big the buffer at stringVariableName is.

The best way to read one full line from a file is getline, but not all C libraries have it. If you don't, you should use fgets, which knows how big the buffer is, and be aware that you might not get a complete line (if the line is too long for the buffer).

zwol
  • 135,547
  • 38
  • 252
  • 361
  • @undefinedbehaviour I can imagine a compiler that rewrites `char buf[N]; ... gets(buf);` as `fgets(buf, N, stdin);` but as far as I know, no compiler in production use actually does that. Is that what you are thinking of? Otherwise, please explain. – zwol Apr 24 '13 at 20:58
  • How does `realloc` know how many bytes to copy from the old to the new object? – autistic Apr 24 '13 at 21:08
  • @undefinedbehaviour: `realloc()` has to be designed to know how much space to copy; the memory allocators are responsible for knowing how much space was allocated. The situation for `gets()` is different; in general, it has no way of knowing the size of the array (argument to function is passed to `gets()`). – Jonathan Leffler Apr 24 '13 at 21:47
0

Reading from the man pages for scanf()...

[ Matches a non-empty sequence of characters from the specified set of accepted characters; the next pointer must be a pointer to char, and there must be enough room for all the characters in the string, plus a terminating null byte. The usual skip of leading white space is suppressed. The string is to be made up of characters in (or not in) a particular set; the set is defined by the characters between the open bracket [ character and a close bracket ] character. The set excludes those characters if the first character after the open bracket is a circumflex (^). To include a close bracket in the set, make it the first character after the open bracket or the circumflex; any other position will end the set. The hyphen character - is also special; when placed between two other characters, it adds all intervening characters to the set. To include a hyphen, make it the last character before the final close bracket. For instance, [^]0-9-] means the set "everything except close bracket, zero through nine, and hyphen". The string ends with the appearance of a character not in the (or, with a circumflex, in) set or when the field width runs out.

In a nutshell, the [^\n] means that read everything from the string that is not a \n and store that in the matching pointer in the argument list.

K Scott Piel
  • 4,320
  • 14
  • 19