1

I want to get the date, month and year information from the string.

Example Date String: Thu, 30 Jul 2020 00:51:08 -0700 (PDT)

PDT here is for Pacific Daylight time. This string offset (-0700) can change based on system timezone when the file was created.

I need to write a c++ program to extract date, month and year from this string.

Any thoughts on how to go about this?

MKS
  • 63
  • 1
  • 10

2 Answers2

2

This is a story of evolution. The correct answer greatly depends on your current toolset (how modern it is). And even if it is completely modern, there are still better tools coming.

Homo habilis

In C++98 we could stand upright. And we had tools to scan ints out of arrays of chars. scanf was the tool to do this. This result was not type safe, but we could scan ints and strings and then reinterpret those values as the components of a date: year, month and day. This might look something like this:

#include <cstdio>
#include <cstring>
#include <iostream>

int
main()
{
    using namespace std;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    char const* months[] = {"Jan", "Feb", "Mar", "Apr", "May", "Jun",
                            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"};
    char wd[4] = {};
    int d;
    char mon[4] = {};
    int y;
    sscanf(s.c_str(), "%s %d %s %d", wd, &d, mon, &y);
    int m;
    for (m = 0; m < 12; ++m)
        if (strcmp(months[m], mon) == 0)
            break;
    ++m;
    cout << y << '\n';
    cout << m << '\n';
    cout << d << '\n';
}

This outputs:

2020
7
30

Notes:

  • The " 00:51:08 -0700 (PDT)" is never even parsed. It could be parsed. But it is a lot more work.
  • There's no error checking. This might be a valid date or might not.
  • There's no type safety. The results are just ints and if you mix them up, it's a run-time error, not a compile-time error.

Neanderthal

Using C++98, there's also a popular but non-standard solution: strptime.

#include <time.h>
#include <iostream>

int
main()
{
    using namespace std;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    tm tm;
    strptime(s.c_str(), "%a, %d %b %Y %T", &tm);
    cout << tm.tm_year + 1900 << '\n';
    cout << tm.tm_mon + 1 << '\n';
    cout << tm.tm_mday << '\n';
    cout << tm.tm_hour << '\n';
    cout << tm.tm_min << '\n';
    cout << tm.tm_sec << '\n';
}

strptime is in the POSIX standard, but not in the C or C++ standards. It is also supported by MS Visual Studio. So it is a popular extension. And with good reason. It is much higher level, and puts the results into a struct tm: A type representing a date/time; the beginnings of type safety.

Output:

2020
7
30
0
51
8

There are still some problems:

  • " -0700 (PDT)" is never parsed. There's no way to ask strptime to do this.
  • There are weird and inconsistent offsets on the different fields of tm. For example the month is zero-based and the day is one-based. But at least it knows how to parse the time too, and relatively easily.
  • Error checking is there but easy to ignore. strptime returns NULL if something bad happens.

Cro-Magnon

With C++11 arrived an actual C++ wrapper around strptime that was officially recognized by the C++ standard with std::get_time:

#include <iomanip>
#include <iostream>
#include <sstream>

int
main()
{
    using namespace std;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    istringstream in{s};
    in.exceptions(ios::failbit);
    tm tm;
    in >> get_time(&tm, "%a, %d %b %Y %T");
    cout << tm.tm_year + 1900 << '\n';
    cout << tm.tm_mon + 1 << '\n';
    cout << tm.tm_mday << '\n';
    cout << tm.tm_hour << '\n';
    cout << tm.tm_min << '\n';
    cout << tm.tm_sec << '\n';
}

With a C++ wrapper you can parse from streams, which gives you access to throwing an exception on parse failure. But it is still a simple wrapper and so the result is just a tm. This has the same weirdness as the previous solution.

The output is the same as in the previous solution:

2020
7
30
0
51
8

Homo sapiens

Though the strongly typed std::chrono time_point / duration system was introduced in C++11, it is not until C++20 that it is integrated with the civil calendar, gaining get_time-like functionality, and going far beyond that.

#include <chrono>
#include <iostream>
#include <sstream>

int
main()
{
    using namespace std;
    using namespace std::chrono;

    string s = "Thu, 30 Jul 2020 00:51:08 -0700 (PDT)";
    istringstream in{s};
    in.exceptions(ios::failbit);
    local_seconds t;
    in >> parse("%a, %d %b %Y %T %z (%Z)", t);
    auto td = floor<days>(t);
    year_month_day ymd{td};
    hh_mm_ss hms{t-td};
    cout << ymd << ' ' << hms << '\n';
    cout << ymd.year() << '\n';
    cout << ymd.month() << '\n';
    cout << ymd.day() << '\n';
    cout << hms.hours() << '\n';
    cout << hms.minutes() << '\n';
    cout << hms.seconds() << '\n';
}

Output:

2020-07-30 00:51:08
2020
Jul
30
0h
51min
8s

The first thing to notice is the much stronger type-safety. No longer is there a need to convert everything to ints to print it out. And no longer is it necessary to convert to ints to do other operations such as arithmetic and comparison.

For example ymd.year() has type std::chrono::year, not int. If necessary, one can explicitly convert between these two representations. But it is generally unnecessary, and akin to a risky reinterpret_cast.

There are no longer unintuitive biases such as 1900, or zero-based counts in unexpected places.

Output generally includes the units for easier debugging.

The " -0700 (PDT)" is parsed here! These values are not used in the results, but they must be there, else there is a parse error. And if you want to get these values, they are available with very simple changes:

string abbrev;
minutes offset;
in >> parse("%a, %d %b %Y %T %z (%Z)", t, abbrev, offset);
...
cout << offset << '\n';
cout << abbrev << '\n';

Now the output includes:

-420min
PDT

If you need the fields in UTC, instead of in local time, that is one simple change:

sys_seconds t;

instead of:

local_seconds t;

Now the offset is subtracted from the parsed time point to result in a UTC time_point (a std::chrono::time_point based on system_clock) instead and the output changes to:

2020-07-30 07:51:08
2020
Jul
30
7h
51min
8s

This allows you to easily parse local times plus offset directly into system_clock::time_point.

Though not shipping yet (as I write this), vendors are working on implementing this. And in the meantime you can get this functionality with a free, open-source, header-only C++20 <chrono> preview library that works with C++11/14/17. Just add #include "date/date.h" and using namespace date; and everything just works. Though with C++11/14 you will need to substitute hh_mm_ss<seconds> hms{t-td}; for hh_mm_ss hms{t-td}; (lack of CTAD).

Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
  • Using the approach mentioned in Homo Sapiens. I get the following error in VS 2013. After including the latest version of the date.h file. Error 1 error C2059: syntax error : '}' Error 2 error C2059: syntax error : '}' Error 3 error C1075: end of file found before the left angle-bracket '<' at 'c:\date-master\include\date\date.h(3871)' was matched – MKS Sep 14 '20 at 13:33
  • I tried it with VS 2019. It just worked fine. Would it be possible to get the below result in VS 2013 by not using the open-source library(date.h) as my current work doesn't allow me to upgrade above VS 2013? 2020-07-30 07:51:08 2020 Jul 30 7h 51min 8s – MKS Sep 14 '20 at 18:09
  • Is this the line the _first_ error is reported on: https://github.com/HowardHinnant/date/blob/master/include/date/date.h#L3871 ? – Howard Hinnant Sep 14 '20 at 18:24
  • Error 1 : Line Number 3829. – MKS Sep 14 '20 at 18:35
  • Error 2: Line Number 7942 – MKS Sep 14 '20 at 18:35
  • On this line: https://github.com/HowardHinnant/date/blob/master/include/date/date.h#L3689 Could you try putting in some `()` like so: `(w < 19)`? If that works for you, I'll push it to master. – Howard Hinnant Sep 14 '20 at 18:46
  • It didn't work. Well, with VS 2019. It works just fine without any change. The problem is only when I use it with VS 2013. – MKS Sep 14 '20 at 19:15
  • My best guess is that somewhere in here: https://github.com/HowardHinnant/date/blob/master/include/date/date.h#L3676-L3828 VS 2013 is interpreting some `<` as an "open-template-paramter-list" instead of as a "less-than". If you can find that spot, I'll fix it. – Howard Hinnant Sep 14 '20 at 19:20
0
#include <time.h>
char *strptime(const char *buf, const char *format, struct tm *tm);