1

I have just started learning C after coding for some while in Java and Python.

I was wondering how I could "validate" a string input (if it stands in a certain criteria) and I stumbled upon the sscanf() function.

I had the impression that it acts kind of similarly to regular expressions, however I didn't quite manage to tell how I can create rather complex queries with it.

For example, lets say I have the following string:

char str[]={"Santa-monica 123"}

I want to use sscanf() to check if the string has only letters, numbers and dashes in it.

Could someone please elaborate?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770

6 Answers6

2

The fact that sscanf allows something that looks a bit like a character class by no means implies that it is anything at all like a regular expression library. In fact, Posix doesn't even require the scanf functions to accept character ranges inside character classes, although I suspect that it will work fine on any implementation you will run into.

But the scanning problem you have does not require regular expressions, either. All you need is a repeated character class match, and sscanf can certainly do that:

#include <stdbool.h>

bool check_string(const char* s) {
  int n = 0;
  sscanf(s, "%*[-a-zA-Z0-9]%n", &n);
  return s[n] == 0;
}

The idea behind that scanf format is that the first conversion will match and discard the longest initial sequence consisting of valid characters. (It might fail if the first character is invalid. Thanks to @chux for pointing that out.) If it succeeds, it will then set n to the current scan point, which is the offset of the next character. If the next character is a NUL, then all the characters were good. (This version returns OK for the empty string, since it contains no illegal characters. If you want the empty string to fail, change the return condition to return n && s[n] == 0;)

You could also do this with the standard regex library (or any more sophisticated library, if you prefer, but the Posix library is usually available without additional work). This requires a little bit more code in order to compile the regular expression. For efficiency, the following attempts to compile the regex only once, but for simplicity I left out the synchronization to avoid data races during initialization, so don't use this in a multithreaded application.

#include <regex.h>
#include <stdbool.h>

bool check_string(const char* s) {
  static regex_t* re_ptr = NULL;
  static regex_t re;
  if (!re_ptr) regcomp((re_ptr = &re), "^[[:alnum:]-]*$", REG_EXTENDED);
  return regexec(re_ptr, s, 0, NULL, 0) == 0;
}
rici
  • 234,347
  • 28
  • 237
  • 341
1

I want to use sscanf() to check if the string has only letters, numbers and dashes in it.

Variation of @rici good answer.

Create a scanset for letters, numbers and dashes.

//v              The * indicates to scan, but not save the result.
//  v            Dash (or minus sign), best to list first.
"%*[-0-9A-Za-z]"
//      ^^^^^^   Letters a-z, both cases
//   ^^^         Digits  

Use "%n" to detect how far the scan went.

Now we can use determine if

  1. Scanning stop due to a null character (the whole string is valid)

  2. Scanning stop due to an invalid character


int n = 0;
sscanf(str, "%*[-0-9A-Za-z]%n", &n);

bool success = (str[n] == '\0');
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

sscanf does not have this functionality, the argument you are referring to is a format specifier and not used for validation. see here: https://www.tutorialspoint.com/c_standard_library/c_function_sscanf.htm

0

as also mentioned sscanf is for a different job. for more in formation see this link. You can loop over string using isalpha and isdigit to check if chars in string are digits and alphabetic characters or no.

    char str[]={"Santa-monica 123"}
    for (int i = 0; str[i] != '\0'; i++)
    {
        if ((!isalpha(str[i])) && (!isdigit(str[i])) && (str[i] != '-'))
            printf("wrong character %c", str[i]);//this will be printed for spaces too
    }
hanie
  • 1,863
  • 3
  • 9
  • 19
0

I want to ... check if the string has only letters, numbers and dashes in it.

In C that's traditionally done with isalnum(3) and friends.

bool valid( const char str[] ) {
  for( const char *p = str; p < str + strlen(str); p++ ) {
    if( ! (isalnum(*p) || *p == '-') )
      return false;
  }
  return true;
}

You can also use your friendly neighborhood regex(3), but you'll find that requires a surprising amount of code for a simple scan.

James K. Lowden
  • 7,574
  • 1
  • 16
  • 31
  • How many lines do you consider surprising? An include, a declaration and two lines of code? http://coliru.stacked-crooked.com/a/3f2127b87da802a7 – rici Apr 07 '20 at 01:11
  • @rici, fair point, well done. I was thinking of what's needed to pull out matched substrings, but of course that's not the OP's question. – James K. Lowden Apr 07 '20 at 22:08
-1

After retrieving value on sscanf(), you may use regular expression to validate the value.

Please see Regular Expression ic C