1

I'm encountering rather strange behavior when preforming a sscanf. Currently working on a windows 7 machine in c.

I have the following:

if( sscanf( str, "%1[a-zA-Z]%31[a-zA-Z+.-]%n", &scheme[ 0 ], &scheme[ 1 ], &num_chars ) >= 1 )
  {
  return( num_chars );
  }

The str variable is a large input string with potentially larger then 32 characters. The scheme variable is declared as an argument to the wrapping function call, it's a 32 character array.

I can easily do this with a couple of scanfs or two separate variables. I was just curious as to why this doesn't work as is.

Edit:
At the time I executed this and the error occurred str contained "tel-net" (was testing the '-') and it resulted in the scheme string having basically no usable characters.

Solution:
I figured out what the problem was, it was actually not a scanf issue at all.

This is how i declared the scheme variable:

IOP_uri_scheme_type   * scheme_str;

IOP_uri_scheme_type was declared as follows:

typedef char    IOP_uri_scheme_type[ IOP_URI_MAX_SCHEME_SZ ];  // Size = 32

The problem was the indexing, scheme[ 1 ] was actually jumping the entire block (all 32 bytes) rather then a character like i was expecting. So technically the scanf was written correctly to begin with (minus the %n thing).

One possible way i can solve this is by casting scheme as a (char *) first or directly manipulating the pointer value, de-referencing it, or just not using a pointer which i don't need anyways.

Thanks for everyone's help.

  • 2
    Is `num_chars` a pointer? – Kerrek SB May 14 '14 at 22:48
  • 3
    When posting a question , instead of just saying "getting strange behaviour" - it's best to describe exactly what behaviour you are getting, and what you were expecting (preferably with an exact example for `str` that demonstrates the unexpected output). – M.M May 14 '14 at 23:11

2 Answers2

3

It appears that you are trying to use regular expressions inside sscanf. As far as I know, sscanf does not have any support for regular expressions.

merlin2011
  • 71,677
  • 44
  • 195
  • 329
  • You can do things with scanf that are similiar to regular expressions... In fact this was based off of one. I have similar expressions scattered across my code (parsing a uri) however this is the only instance where I used the same output variable twice, I believe that is where the problem lies. This compiles and runs, just doesnt populate the output string properly – Justin Youngheim May 14 '14 at 22:55
  • @user3638657, Take a look at [this answer](http://stackoverflow.com/a/15664770/391161). You might be able to do some tricks that look like it, but I'm not sure if the expression in your question falls under those tricks. – merlin2011 May 14 '14 at 22:58
0

Here is a test suite I made for this case (with size reduced for readability):

#include <stdio.h>

int main()
{
    char str[] = "tel-net";
    char scheme[13] = { 0 };
    int num_chars;
    int result = sscanf( str, "%1[a-zA-Z]%11[a-zA-Z+.-]%n",
                            &scheme[ 0 ], &scheme[ 1 ], &num_chars );

    printf("result = %d\n", result);
    printf("scheme = '%s'\n", scheme);

    printf("scheme = ");
    for (int ii = 0; ii < sizeof scheme; ++ii)
        printf("%02x ", (unsigned char)scheme[ii]);
    printf("\n");

    if ( result == 2 )
        printf("num_chars = %d\n", num_chars);

    return 0;
}

where the output is:

result = 2
scheme = 'tel-net'
scheme = 74 65 6c 2d 6e 65 74 00 00 00 00 00 00
num_chars = 7

Can you post your output?

Note that your program has a bug, since the %n will not be processed if the second [ fails. You can only return num_chars if the return value is exactly 2.

Regarding the "regular expressions": according to the C standard it is implementation-defined what happens when you use a hyphen inside the [ ] specifier like this. Your compiler (plus C library etc.) may or may not support the usage you are trying. Check your compiler's documentation of scanf to see what it says about this case.

NB. I originally posted an answer saying it was undefined to read into overlapping objects - however I think that is actually false, and it is fine because the arguments are processed in order (and the standard does not say that it it is undefined).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • scheme contains the letters 't' '\0' and then garbage after i execute a run through. – Justin Youngheim May 15 '14 at 13:09
  • I'll also mention i can use two separate variables as output, so: sscanf( str, "%1[a-zA-Z]%31[a-zA-Z+.-]", &scheme1[ 0 ], &scheme2[ 1 ] ); That works as expected, with the 't' in scheme1 and the 'garbage' + "el-net" in scheme2. – Justin Youngheim May 15 '14 at 13:19
  • OK, looks like you solved your problem (`scheme` was actually not a 32-character array, it was a pointer to one). – M.M May 15 '14 at 21:06