4

I want to create a simple "date" read using sscanf which accepts inputs as:

"dd/mm/yyyy"

Both "dd" and "mm" fields can be, as much, 2 digits longs (eg. 0, 6 or 11, but not 123). The "years" field can be either 0 or a four digits field. A value of 0 in any of these three fields means the day, month or year of the system have to be taken instead.

That format must be strict, so, if the format of the input doesn't fit the pattern, the user must be notified.

My attempt is:

int d, m, y;
char const* input = "23/7/1990";

int n = sscanf(input, "%2u/%2u/%4u", &d, &m, &y);

if (n != 3) throw InvalidDate("Invalid format");

// Fill 0 values with system date.
// Check date correctness with `mktime` and `localtime`.

The problem is this sscanf format accepts non-allowed inputs as:

char const* invalid1 = "23/ 12/ 1990";
char const* invalid2 = "23/12/1990/123whatever......."

So, is there any tricks/modifiers to reject leading zeros before integers, to mark the end of string, or to cause a detectable failure if more input is parsed?

For the last case (invalid2; detectable failure at the end of string), a possible solution would be:

int d, m, y;
char trick;

char const* input = "23/7/1990";

int n = sscanf(input, "%2u/%2u/%4u%c", &d, &m, &y, &trick);

// If it fills four fields, means the input was too long.
if (fields != 3) throw InvalidDate("Invalid format");

// Fill 0 values with system date.

But I don't know if there is a better way to detect the end-of-string. Even more, this "format" (with the leading '%c') causes sscanf believes the input was wrong with valid dates (for example, "23/6/1990" provokes the last char is not fill; if scanf were used instead of sscanf, ferror would be set). I have even tried with "%2u/%2u/%4u\0" but the compiler warns me with "embedded \0 in format".

So, what is the best solution without using regular expressions or stringstream?

By the way, is there other ways to "cheat" sscanf?

manlio
  • 18,345
  • 14
  • 76
  • 126
ABu
  • 10,423
  • 6
  • 52
  • 103
  • As you know although `scanf` accept them, but your variables (`d m y`) get filled with correct value as you need. what's your concern? you want to detect bad-formatted input ? – Emadpres May 16 '15 at 12:36
  • Yes. I want to detect bad-formatted input to notify the user. I have modified my question to make it explicit. – ABu May 16 '15 at 12:37
  • 6
    The `sscanf` function is more liberal than you want to tolerate, so do not use it. Instead, you'll have to parse the data manually. One middle-ground option is to qualify the input with a regular expression first and, if it matches, use `sscanf` to pick out the components (but you could do that with the same regex as well). – mah May 16 '15 at 12:44
  • as I read your question, you want to catch day/month/year values that are out of range. (note: 0 for month or day is not a valid value). After reading in the values, simply check their value against the valid range for that value. if the value is not valid, loop back to let the user try again. (note: never trust what a user enters) – user3629249 May 16 '15 at 15:51
  • the format expression could be: "%2u/%2u/%4u%*c" then the last character (which, if user input would probably be a newlline) can be consumed without actually having any related parameter in the parameter list, as the '*' causes the %c to be consumed but ignored – user3629249 May 16 '15 at 15:56
  • regarding this line: 'if (fields != 3) throw InvalidDate("Invalid format");' There are 4 input/format converters in the format string. So, it 'should' never be 3 returned from scanf(). Suggest checking for 4, even if using the suggested: "%2u/%2u/%4u%*c" format string. – user3629249 May 16 '15 at 16:06
  • The value returned by `sscanf` is the number of assignments made, not the number of "conversions" made. So, `*` modifier doesn't increment the asignment counter. In the other hand, I need to know if there is tail characters. If it is the case, an error must be thrown. So, If it has been readed 4 fields, it means the input was wrong. I'm aware that a return value of 3 represents "an invalid state", but since there's no side effects (as opposed to `scanf`, which sets ferror), it's a not-so-bad option. – ABu May 16 '15 at 16:37

5 Answers5

1

You may use boost regex library which can do a lot of these stuffs. Check the code below:

#include <boost/regex.hpp>
#include <iostream>
#include <string>

int main()
{
    // Expression to match
    boost::regex e("(^\\d{1,2})/(\\d{1,2})/(\\d{4})$");

    // Results are here
    boost::match_results<std::string::const_iterator>  results;

    std::string val_to_match = "1/11/1990";
    if (boost::regex_search(val_to_match, results, e) && results.size() == 4) {
        std::cout << "Matched "  << results[0] << std::endl; 
        int i = 1;
        while (i < 4) {
            std::cout << "Value: " << i <<  "  "<< results[i] << std::endl;
            i++;
        }
    } else {
        std::cout << "Couldn't match \n";
    }

    return 0;
}
Nipun Talukdar
  • 4,975
  • 6
  • 30
  • 42
1

Modified your code, and got this working:

 void parseDate(const char *date) {

      char trick;
      int d, m, y, n = sscanf(date, "%2u/%2u/%4u%c", &d, &m, &y, &trick);

      (n != 3 || y < 999)) ? 
           puts("Invalid format!") : printf("%u %u %u\n", d, m, y);
 }

You mentioned that "year" can be either zero or a four-digit number, so I modified your code to accept 1000 to 9999 only. Or else, 23/7/1a990 case will have a year of 1.

Tested this one and put the output to a file.

Results:

Sample date: 23/7/1990
Output: 23 7 1990

Sample date: 23/12/1990/123whatever.......
Output: Invalid format!

Sample date: 23/ 12/ 1990
Output: 23 12 1990

Sample date: 23/12/19a90
Output: Invalid format!

Sample date: 2a/1
Output: Invalid format!

Sample date: a23/12/1990
Output: Invalid format!

Sample date: 23/12/199000
Output: Invalid format!

You can refer to this thread: How to parse and validate a date in std::string in C++?. One answer there suggests using strptime.

Community
  • 1
  • 1
raymelfrancisco
  • 828
  • 2
  • 11
  • 21
  • 1
    Interesting, but I think `strptime` is not a `std` function, but POSIX. In respect to check if the year is 0 or a 4-digits string, it's true that a 3-digits year (for example) violates the format specification, but it will be tested thereafter, since we restrict valid dates (not valid format, though) as ones over year 1900. Anyway, my question didn't specified that (I spoke only about the input format requirements, not about "date semantics"), so, I'll change my answer to add your code (although in my real code it won't be). – ABu May 16 '15 at 18:15
  • 1
    Your check (y > 10000) is unnecesary, because it's controlled with the `sscanf` format: if year is more than 4 digits long, the fifth digit is set in the `trick` variable and the condition `n != 3` will be violated. – ABu May 16 '15 at 18:19
  • @Peregring-lk I didn't noticed that unecessary check, thank you for pointing that. I edited my answer removing `(y > 1000)` to simplify things. – raymelfrancisco May 17 '15 at 11:59
  • @Peregring-lk I did the restriction of years to be over 1000 because `%4u` in `sscanf` gets any integer it can get. `23/7/1a990` will have a year of `1`, `23/7/19b0` will have a year of `19`, and without restriction, the format will still be valid even if it is obviously invalid. I'll try to find another solution using for those cases. – raymelfrancisco May 17 '15 at 12:00
  • But %4u can't get "23/7/19004". It takes "1900", and the remaining 4 is taken by the last "%c". – ABu May 17 '15 at 12:17
  • @Peregring-lk Do you want to accept more than 4 digits in year? – raymelfrancisco May 17 '15 at 12:23
  • If that's the case, then changing `%4u` to `%u` will suffice. `23/7/19004` will be a valid format. – raymelfrancisco May 17 '15 at 12:31
  • 1
    No. I wan't to accept more than 4 digits. I want to accept the year 0, or a year with 4 digits as much. If the input has more (digits or other characters), a error should be reported. For this reason, I have an extra `%c`. If it is filled, something was wrong (the year had less than 4 digits with a following non-digit character, or the year has 4 digits but more input was inserted). See my answer below. – ABu May 17 '15 at 13:17
  • Okay, I thought I misunderstood your problem. Yes, `%4u` in `sscanf` will get `1990` and `4` will be taken by `%c`. I'm sorry on the "any integer" I mentioned earlier. It should be `%4u` will only get any 1-digit to 4-digit integer it can. I'm sorry for some misunderstanding. – raymelfrancisco May 17 '15 at 13:32
1

How about this? You can use %[^0-9] conversion specification to read characters between two numbers.

#include <stdio.h>
#include <string.h>

void process_date(const char* input){
  int d, m, y;
  char sep1[3], sep2[3], trick;
  int n;

  n = sscanf(
    input, "%2u%2[^0-9]%2u%2[^0-9]%4u%c",
    &d, sep1, &m, sep2, &y, &trick);

  if(!(n == 5 && strcmp(sep1, "/") == 0 && strcmp(sep2, "/") == 0)){
    fprintf(stderr, "Invalid format (input = %s).\n", input);
    return;
  }

  printf("d = %d, m = %d, y = %d.\n", d, m, y);
}

int main(){
  process_date("23/7/1990");
  process_date("23/12/1990");
  process_date("23/7/0");
  process_date("23/0/1990");
  process_date("0/7/1990");

  process_date("23/ 12/ 1990");
  process_date("23/12/1990/123whatever.......");
  process_date("123/7/1990");
  process_date("23/12/19a90");
  process_date("2a/1");
  process_date("a23/12/1990");
  process_date("23/12/199000");

  return 0;
}

Outputs:

d = 23, m = 7, y = 1990.
d = 23, m = 12, y = 1990.
d = 23, m = 7, y = 0.
d = 23, m = 0, y = 1990.
d = 0, m = 7, y = 1990.
Invalid format (input = 23/ 12/ 1990).
Invalid format (input = 23/12/1990/123whatever.......).
Invalid format (input = 123/7/1990).
Invalid format (input = 23/12/19a90).
Invalid format (input = 2a/1).
Invalid format (input = a23/12/1990).
Invalid format (input = 23/12/199000).
akinomyoga
  • 219
  • 1
  • 9
0

How about something like this? It doesn't use sscanf, but as has been said in the comments, it'd be hard to make that function work as you want:

int d, m, y;

int date[3];        //holds day/month/year in its cells
int tokenCount = 0;
char* pc;
int result = 0;
char* pch = strtok(input, "/");

while (pch != NULL)
{
    if (strlen(pch) == 0)
    {
        throw InvalidDate("Invalid format");
    }

    //atoi is stupid, there's no way to tell whether the string didn't contain a valid integer or if it contained a zero
    result = strtol(pch, &pc, 10);
    if (*pc != 0)
    {
        throw InvalidDate("Invalid format");
    }

    if (tokenCount > 2)     //we got too many tokens
    {
        throw InvalidDate("Invalid format");
    }

    date[tokenCount] = result;
    tokenCount++;

    pch = strtok(NULL, "/");
}

if (tokenCount != 3)
{
    //not enough tokens were supplied
    throw InvalidDate("Invalid format");
}


d = date[0];
m = date[1];
y = date[2];

You can then do some more checking, such as whether the month is between 1-12.

One thing to bear in mind is that strtok modifies the string it receives, so make sure to make a copy.

user4520
  • 3,401
  • 1
  • 27
  • 50
  • I had a solution using exclusively `strtol` without a `while` which worked fine, but I want to change it to `sscanf` because it is highly shorter. Perhaps a solution would be adding a call to `strpbrk` to find whitespaces. – ABu May 16 '15 at 13:04
  • @Peregring-lk Yes, `while` can be avoided and maybe it's even better since you can check the range at the same time. Well, good luck with `sscanf` then :) – user4520 May 16 '15 at 13:07
  • this will not work. The reason is the code is expecting a trailing character. So the third call to strtok() would return "1992x\0" – user3629249 May 16 '15 at 16:00
  • @user3629249 Can you elaborate? I've tested it and it worked fine for the test input the OP supplied. – user4520 May 16 '15 at 16:02
0

So. Since it seems everybody agrees there's no way to make sscanf fits better to this pattern, I think the best solution is:

char const* input = "23/7/1990";

int d, m, y;

{ // Search blanks due to `sscanf` limitations.
    for (unsigned i = 0; i < 10 and input[i] != '\0'; ++i)
        if (isspace(input[i]))
           throw InvalidDate("Invalid format");

} { // Check format (with extra input detection).
    char trick;
    int n = sscanf(input, "%2u/%2u/%4u%c", &d, &m, &y, &trick);

    if (n != 3 or (y != 0 and y < 1000))
        throw InvalidDate("Invalid format");
}

// Fill 0 values with system date.
// Check date correctness with `mktime` and `localtime`.

EDIT: Before, I used strpbrk to detect blanks (sscanf ignores it before numbers). The problem with that solution was strpbrk parses the complete input until it finds something. If input is too long, but without blanks, the execution would be very slow. Since I know the max allowable size of input, I've change it by a 10-loop for which uses isspace.

Performance can be of course improved by throwing if '\0' is found too soon, but determining "too soon" within for would be too verbose . So, I left this work for sscanf, making the first for better-defined.

Any other "complains" to that solution are strongly wellcome.

ABu
  • 10,423
  • 6
  • 52
  • 103
  • Perhaps a little faster than calling the generic `strpbrk()` would be a simple `for (const char *p = input; *p; ++p) { if (ispace(*p) throw InvalidDate("Invalid format"); }`. Note that `isspace()` looks for all whitespace characters. Since your input is highly well defined though, you could modify this loop to also parse out your components and thus avoid `sscanf()` completely. – mah May 16 '15 at 13:31
  • Yes, perhaps. Even more, since I know the input cannot have more than 10 characters, it can be just a 10-trip loop, but parsing the input within that `for` could make the code very ofuscated. `sscanf` is cleaner and also won't do more than 11 loops (the last %c), and I'd bet `sscanf` is pretty fast even despite is genericity. – ABu May 16 '15 at 14:19
  • @mah I've changed my solution accordingly. – ABu May 16 '15 at 15:42