2

I have a string with 3 dates in it like this:

XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx

I want to select the 2nd date in the string, the 20180208 one.

Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.

Thanks for any help.

Jan
  • 42,290
  • 8
  • 54
  • 79
handles
  • 7,639
  • 17
  • 63
  • 85

4 Answers4

3

You could use

^(?:[^_]+_){2}(\d+)

And take the first group, see a demo on regex101.com.


Broken down, this says
^              # start of the string
(?:[^_]+_){2}  # not _ + _, twice
(\d+)          # capture digits

C# demo:

var pattern = @"^(?:[^_]+_){2}(\d+)"; 
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Jan
  • 42,290
  • 8
  • 54
  • 79
  • 1
    `^(?:([^_]*)_){3}` and capture the 1st group would also work since only the last match of a group is captured. Not that it it better. – Aaron Feb 21 '18 at 20:38
  • @Aaron: Feel free to post it as an alternative answer as it works as well. – Jan Feb 21 '18 at 20:43
  • Meh, better use yours IMO,13 steps vs 19 for mine in regex101. It was more to showcase a regex feature than anything. – Aaron Feb 21 '18 at 20:47
  • Your regex gives an incorrect result if the string begins with the first date or if the dates are positioned differently in the string. [Demo](https://regex101.com/r/DQBnnh/3/). – Cary Swoveland Apr 17 '20 at 21:10
0

Try this one

MatchCollection matches = Regex.Matches(sInputLine, @"\d{8}");

string sSecond = matches[1].ToString();

AJP
  • 43
  • 1
  • 9
0

You could use the regular expression

^(?:.*?\d{8}_){1}.*?(\d{8})

to save the 2nd date to capture group 1.

Demo

Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use

^(?:.*?\d{8}_){0}.*?(\d{8})

Demo

The C#'s regex engine performs the following operations.

^        # match the beginning of a line
(?:      # begin a non-capture group
  .*?    # match 0+ chars lazily
  \d{8}  # match 8 digits
  _      # match '_'
)        # end non-capture group
{n}      # execute non-capture group n (n >= 0) times
.*?      # match 0+ chars lazily     
(\d{8})  # match 8 digits in capture group 1

The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string

_1234abcd_efghi_123456789_12345678_ABC

capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
-1

You can use System.Text.RegularExpressions.Regex

See the following example

Regex regex = new Regex(@"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
    Console.WriteLine(groups[1].Value);
}
Mohammad Ali
  • 551
  • 7
  • 17