1

I need to validate a timestamp string for one of my embedded applications. The SDK does not provide regex.h so I need to come up with another solution.

I been googling and found some lightweight regex alternatives on github but I wanted to see if there is a better/simpler alternative before I start to integrate that into the build.

Any suggestion how to make such a function in C? The string will have the format: YYYY-MM-DD HH:MM:SS. I control this format too so if another is better I can adopt to that.

Fever
  • 51
  • 4
  • If you know the string is in `YYYY-MM-DD HH:MM:SS` format, why do you need to pattern match it? Probably you mean extracting the `Y`,`M`,`D`,`H`,`M`,`S` values? – nice_dev Nov 14 '18 at 19:07
  • Have you tried `strptime`? – KamilCuk Nov 14 '18 at 19:07
  • You're right. I miss expressed the question and it should have said validate instead of pattern matching. I have changed the title now. – Fever Nov 15 '18 at 07:05
  • It is a very narrow and simple requirement; unless you will be validating other differently formatted strings, a general purpose matching/validating library will add a prohibitively large amount of code. Just read the delimited tokens and validate them - you will write perhaps more code, but that code will be smaller than any general purpose library code you might otherwise import. – Clifford Nov 15 '18 at 20:03

1 Answers1

1

By "pattern-match" I assume you want to know if such a string is valid.

#include <stdbool.h>
#include <string.h>

bool is_leap_year(int year)
{
    return (year & 3) == 0 && ((year % 25) != 0 || (year & 15) == 0); // *)
}

bool in_range(int min, int value, int max)
{
    return min <= value && value <= max;
}

bool is_valid_timestamp(char const *datetime)
{
    int const days_per_month[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };
    int y, m, d, h, min, sec;
    char seperators[5];

    return strlen(datetime) == 19
        && sscanf(datetime, "%d%c%d%c%d%c%d%c%d%c%d", &y, &seperators[0],
                  &m, &seperators[1], &d, &seperators[2], &h, &seperators[3], 
                  &min, &seperators[4], &sec) == 11
        && in_range(0, y, 9999) && in_range(1, m, 12)
        && in_range(1, d, m == 2 && is_leap_year(y) ? 29 : days_per_month[m - 1])
        && in_range(0, h, 23) && in_range(0, min, 59) && in_range(0, sec, 59)
        && strncmp(seperators, "-- ::", 5) == 0;
}

in_range(0, y, 9999) ... or whatever you consider a "valid" year.

*) https://stackoverflow.com/a/11595914/3975177

Swordfish
  • 12,971
  • 3
  • 21
  • 43
  • I must say that I'm impressed. Works like a charm! Thank you very much for your time. – Fever Nov 15 '18 at 07:06
  • Why `year % 25` instead of `year % 100`? They both end up working the same, but `% 100` uses the actual value the formula is based on (Not years divisible by 100, except years divisible by 400). Just a preference to use the smallest number possible, or is there an efficiency gain I'm missing? I can certainly understand why you would avoid another division with `& 15` instead of `% 400` though. – Sam Skuce Nov 16 '18 at 18:34
  • 1
    Please have a look at the link in my answer: "The 100th year test utilizes modulo 25 instead of modulo 100. We can do this because 100 factors out to 2 x 2 x 5 x 5. Because the 4th year test already checks for factors of 4 we can eliminate that factor from 100, leaving 25. This optimization is probably insignificant to nearly every CPU implementation (as both 100 and 25 fit in 8-bits)." – Swordfish Nov 16 '18 at 18:45