-1

I'm not sure this is the best place to ask this or not, so apologies in advance if not.

I have the need to extract multiple dates from a string. However, the format of the dates can vary from string to string (the format of the two dates in a single string should be the same) and the text around the dates can vary as well. I have no control of the strings, but they will all be in UK order of day and month. Example strings include, but is not limited to

From 1 March 1960 To 1 March 2235

For a period starting 1/3/1960 and ending 1/3/2235

Starting 1.3.1960 and ending 1.3.2235

My current thinking is to run a number of RegEx's on the string, one for each potential format, with some logic to limit which ones to use (for example, if the string contained '/' I'd run those RegEx variants that use that first).

However, I was hoping that there is a better way to achieve this. I've found out that the environment it will run in may not be able to call web services. So I am looking for a self contained solution if possible.

Ian Boggs
  • 522
  • 3
  • 16
  • Is it safe to assume that "From" date always comes before the "To" date? This is more of a general programming task and not strictly related to c#, I would say. – MarengoHue Feb 21 '19 at 09:09
  • Yes, the From date should always come before the To – Ian Boggs Feb 21 '19 at 09:12
  • Have you considered some natural language processing of the two parts? This question here might help: https://stackoverflow.com/questions/23689/natural-language-date-time-parser-for-net – Iain M Norman Feb 21 '19 at 09:12
  • I have experience of calling Luis, but this failed as the dates can be later than the year 2100 anyway before I found out I had no access to the outside world. I'll investigate that library – Ian Boggs Feb 21 '19 at 09:23

2 Answers2

1

You could do with two regexes and one replace and next use DateTime.ParseExact to convert the dates in a DateTime object. Something like this perhaps:

string[] lines = { "From 1 March 1960 To 1 March 2235", 
                   "For a period starting 1/3/1960 and ending 1/3/2235", 
                   "Starting 1.3.1960 and ending 1.3.2235", 
                   "Just some string with no dates in it" };
foreach (string line in lines) {

    Console.ForegroundColor = ConsoleColor.Yellow;
    Console.WriteLine(System.Environment.NewLine + line);
    Console.ResetColor();

    if (Regex.IsMatch(line, @"(\d{1,2}\s+\w+\s+\d{4})"))
    {
        Regex regexObj = new Regex(@"(\d{1,2}\s+\w+\s+\d{4})");
        Match matchResults = regexObj.Match(line);
        while (matchResults.Success)
        {
            DateTime dte = DateTime.ParseExact(matchResults.Value, "d MMMM yyyy", CultureInfo.GetCultureInfo("en-GB"));
            Console.WriteLine(dte.ToShortDateString());
            matchResults = matchResults.NextMatch();
        }
    }
    else if (Regex.IsMatch(line, @"(\d{1,2}[./]\d{1,2}[./]\d{4})"))
    {
        Regex regexObj = new Regex(@"(\d{1,2}[./]\d{1,2}[./]\d{4})");
        Match matchResults = regexObj.Match(line);
        while (matchResults.Success)
        {
            DateTime dte = DateTime.ParseExact(matchResults.Value.Replace(".","/"), "d/M/yyyy", CultureInfo.GetCultureInfo("en-GB"));
            Console.WriteLine(dte.ToShortDateString());
            matchResults = matchResults.NextMatch();
        }
    }
    else { Console.WriteLine("No valid date found."); }

}

The above returns

From 1 March 1960 To 1 March 2235
1/3/1960
1/3/2235

For a period starting 1/3/1960 and ending 1/3/2235
1/3/1960
1/3/2235

Starting 1.3.1960 and ending 1.3.2235
1/3/1960
1/3/2235

Just some string with no dates in it
No valid date found.
Theo
  • 57,719
  • 8
  • 24
  • 41
0

Try Regex: \b(?:(?:31(\/|-| |\.)(?:0?[13578]|1[02]|(?:Jan|January|Mar|March|May|Jul|July|Aug|August|Oct|October|Dec|December)))\1|(?:(?:29|30)(\/|-| |\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|January|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})\b|\b(?:29(\/|-| |\.)(?:0?2|(?:Feb|February))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))\b|\b(?:0?[1-9]|1\d|2[0-8])(\/|-| |\.)(?:(?:0?[1-9]|(?:Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September))|(?:1[0-2]|(?:Oct|October|Nov|November|Dec|December)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})\b

Demo

Matt.G
  • 3,586
  • 2
  • 10
  • 23