2

A comment (which should probably be submitted as an answer) has the code

sscanf(string, "<title>%[^<]</title>", extracted_string);

Running the code seems to copy the text between the <title> tags to extracted_string, but I cannot find any references to a caret in the printf family, either in the man pages or elsewhere online.

Can someone point me to a resource that explains the use of %[^<], and other similar syntax, in the sscanf() family?

Community
  • 1
  • 1
user1717828
  • 7,122
  • 8
  • 34
  • 59
  • 1
    This is a bad idea, `sscanf()` is not for regular expression matching. – Iharob Al Asimi May 18 '15 at 13:46
  • 1
    It is in the [man page](http://linux.die.net/man/3/sscanf) which you linked to. See conversions, `[` – Spikatrix May 18 '15 at 13:46
  • Did you search for "caret" rather than "^"? The manual page calls it the [circumflex](http://en.wikipedia.org/wiki/Circumflex) which seems to be technically wrong. – unwind May 18 '15 at 13:58
  • @unwind, I searched for `^`, which I actually had to [Wiki](http://ddg.gg/?q=!w+^) to find out the spelling was "caret" instead of "carrot" :-D – user1717828 May 18 '15 at 14:06
  • @user1717828 Then I don't understand why you didn't find it on the first man page link you provided. :/ – unwind May 18 '15 at 14:21
  • @unwind WOW I'M RETARDED, probably sausage-fingered the key or something – user1717828 May 18 '15 at 14:32

3 Answers3

6

From the C11 standard document, chapter §7.21.6.2, Paragraph 12, conversion specifiers, (emphasis mine)

[

Matches a nonempty sequence of characters from a set of expected characters (the scanset).

....

The conversion specifier includes all subsequent characters in the format string, up to and including the matching right bracket (]). The characters between the brackets (the scanlist) compose the scanset, unless the character after the left bracket is a circumflex (^), in which case the scanset contains all characters that do not appear in the scanlist between the circumflex and the right bracket.

A draft version of the standard, found online.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • This OP just posted another question about the same thing. – Iharob Al Asimi May 18 '15 at 13:48
  • @iharob, whoops, did I double post? – user1717828 May 18 '15 at 13:50
  • @user1717828 , AFAIK, you can't. You'll have to wait 90 mins. I guess iharob has two tabs open with the same question? :-) – Spikatrix May 18 '15 at 13:51
  • @user1717828 no it's not the same question, it's just about achieving the same thing, this is a valid question though, and Sourav just gave an exceltent answer, my point is that you can do it this way too, but it will easily fail. – Iharob Al Asimi May 18 '15 at 13:54
  • @SouravGhosh, thanks very much! I'll mark as answered after 10 minutes. Can you post a link to the standard you got this from? I tried searching but apparently [StackOverflow thinks this is a stupid question.](http://stackoverflow.com/questions/11504312/where-to-get-the-latest-ansi-c-standard-document) – user1717828 May 18 '15 at 13:57
  • @user1717828 , Google for "n1570". It is the draft of the C11 standard: http://www.iso-9899.info/n1570.html – Spikatrix May 18 '15 at 13:58
  • @user1717828 added the link. :-) – Sourav Ghosh May 18 '15 at 14:01
3

It means match anything that is not a <, it's not a good idea to do that without specifying the maximum destination buffer length, if your destination buffer can hold say 100 characters, then

char extracted_string[100];
sscanf(string, "<title>%99[^<]</title>", extracted_string);

would be a better solution.

Using strstr() for this purpose allows you to actually make extracted_string dynamic.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
1

this link explains the [ and ^ usage in scanf family of functions

(emphasis mine)

http://www.cdf.toronto.edu/~ajr/209/notes/printf.html


[

Matches a nonempty sequence of characters from the specified set of accepted characters; the next pointer must be a pointer to char, and there must be enough room for all the characters in the string, plus a terminating null byte. The usual skip of leading white space is suppressed. The string is to be made up of characters in (or not in) a particular set; the set is defined by the characters between the open bracket [ character and a close bracket ] character. The set excludes those characters if the first character after the open bracket is a circumflex (^). To include a close bracket in the set, make it the first character after the open bracket or the circumflex; any other position will end the set. The hyphen character - is also special; when placed between two other characters, it adds all intervening characters to the set. To include a hyphen, make it the last character before the final close bracket. For instance, [^]0-9-] means the set "everything except close bracket, zero through nine, and hyphen". The string ends with the appearance of a character not in the (or, with a circumflex, in) set or when the field width runs out.

user3629249
  • 16,402
  • 1
  • 16
  • 17
  • Cool man, thanks. This is going to sound like a vague request, but can you tell me how you found it? That is, how did you know to reference this particular page, and how did you get there... if that makes any sense... – user1717828 May 19 '15 at 23:18
  • I'm on a ubuntu linux system. I started with 'man sscanf' then googled 'sscanf format specifiers' I did read several web pages before finding a clear definition that I could use in the answer, to include a link – user3629249 May 20 '15 at 00:11