0

I need to find a regex to extract date section from the name of several files.

In particular I have these two formats:

  • ATC0200720140828080610.xls
  • ATC0200720140901080346_UFF_ACC.xls

I use these two regex to check file name format:

  • ^ATC02007[0-9]{14}.xls$
  • ^ATC02007[0-9]{14}_UFF_ACC.xls$

But I need a regex to extract a specific section:

constant | yyyyMMddHHmmss |   constant
    ^            ^               ^
ATC02007 | 20140901080346 | _UFF_ACC.xls

Both regex I'm using match the entire file name, so I can't use to extract the middle section, so which is the right expression?

davioooh
  • 23,742
  • 39
  • 159
  • 250

2 Answers2

2

You are almost there. Just use round brackets to contain the numbers you want.

^ATC02007([0-9]{14})(_UFF_ACC)?.xls$

See example. The numbers are captured in group 1$1.

Steven Xu
  • 605
  • 6
  • 12
1

You need to use capturing groups.

^(ATC02007)([0-9]{14})((?:[^.]*)?\\.xls)$

DEMO

GRoup index 1 contains the first constant and group 2 contains date and time and group 3 contains the third constant.

String s = "ATC0200720140828080610.xls\n" + 
        "ATC0200720140901080346_UFF_ACC.xls";
Pattern regex = Pattern.compile("(?m)^(ATC02007)([0-9]{14})((?:[^.]*)?\\.xls)$");
 Matcher matcher = regex.matcher(s);
 while(matcher.find()){
        System.out.println(matcher.group(1));
        System.out.println(matcher.group(2));
        System.out.println(matcher.group(3));
}

Output:

ATC02007
20140828080610
.xls
ATC02007
20140901080346
_UFF_ACC.xls
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274