2

I have a function in c++ that takes in an input string representing a date of the format MM/DD/YYYY. The function uses the C implementation of regex due to limitations of my environment. I am attempting to extract the year, month, and date from the string.

#include <stdarg.h>
#include <string.h>
#include <iostream>
#include <regex.h>
#include <sys/types.h> 

using namespace std;


void convertDate(string input)
{

    char pattern[100];
    regex_t preg[1];
    regmatch_t match[100];
    const char * reg_data = input.c_str();
    string year;
    string month;
    string day;

    strcpy(pattern, "^([0-9]{1,2})/([0-9]{1,2})/([0-9]{4})$");
    int rc = regcomp(preg, pattern, REG_EXTENDED); 
    rc=regexec(preg, reg_data, 100, match, 0);
    if( rc != REG_NOMATCH ) 
    {
       year = input.substr(match[3].rm_so, match[3].rm_eo);
       month = input.substr(match[1].rm_so, match[1].rm_eo);
       day = input.substr(match[2].rm_so, match[2].rm_eo);
       cout << year << endl;
       cout << month << endl;
       cout << day << endl;
    }

}

Here are some examples of input/output:

1) string input2 = "8/11/2014";
   convertDate(input2);

   2014
   8
   11/2

2) string input2 = "11/8/2014";
   convertDate(input2);

   2014
   11
   8/20

3) string input2 = "1/1/2014";
   convertDate(input2);

   2014
   1
   1/2

I'm not sure why the day is capturing a regex group of length 4, when the capture group states it should only be capturing 1 or 2 characters that are digits. Also, why would the day be having this issue, when the month is correct? They use the same logic, it looks like.

I used the documentation here

Ryan
  • 14,392
  • 8
  • 62
  • 102
Danzo
  • 553
  • 3
  • 13
  • 26
  • What compiler and version are you using? – NathanOliver Jun 08 '16 at 19:33
  • I'm using an online compiler that is using c++11. [See here](http://www.tutorialspoint.com/compile_cpp11_online.php) @NathanOliver – Danzo Jun 08 '16 at 19:39
  • OK. That is gcc 5.3.1. I asked because of [this](http://stackoverflow.com/questions/12530406/is-gcc-4-8-or-earlier-buggy-about-regular-expressions) – NathanOliver Jun 08 '16 at 19:41
  • I see. So this is not a bug due to it being a more version than 4.8? @NathanOliver – Danzo Jun 08 '16 at 19:46
  • Pardon my ignorance, but since when does the standard C language have regular expressions? – Thomas Matthews Jun 08 '16 at 19:49
  • Yes it was not that "bug" – NathanOliver Jun 08 '16 at 19:49
  • 2
    @ThomasMatthews C standard doesn't have regex. [`` is from POSIX](http://pubs.opengroup.org/onlinepubs/009695399/functions/regcomp.html). Nevertheless, for C++11 we should use [C++'s standard `` library](http://en.cppreference.com/w/cpp/regex). – kennytm Jun 08 '16 at 19:52
  • @ThomasMatthews: Standard C does not have regexes (but Standard C++ does — see [List of standard headers in C and C++](https://stackoverflow.com/questions/2027991)). POSIX has a set of regular expression code described in [``](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/regex.h.html#tag_13_38) — [`regcomp()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html) et al; presumably that's what's in use here (the calls shown are consistent with POSIX). – Jonathan Leffler Jun 08 '16 at 19:54
  • I am testing on the online compiler mentioned above. My actual code is running on IBM's Netezza platform via a UDF. It doesn't recognize c++'s standard regex library. @kennytm – Danzo Jun 08 '16 at 19:55
  • @Danzo, please clarify "The C implementation of regex". I'm trying to figure out the rules of the regular expression. Different libraries have different interpretations. – Thomas Matthews Jun 08 '16 at 19:57
  • @Danzo: Are you telling the C++ compiler to use C++11 on the Netezza platform? It may not be doing so by default. See if you can find a way to tell it to use C++11. – Jonathan Leffler Jun 08 '16 at 19:57
  • Also, you ignore the possibility of an error from the `regcomp()` call. You should check that too. – Jonathan Leffler Jun 08 '16 at 19:59

1 Answers1

2

You are using the .substr method wrongly. The second argument of substr should be the length of the substring, but you are giving the end-index to it. Try this instead:

   day = input.substr(match[2].rm_so, match[2].rm_eo - match[2].rm_so);
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005