1

I would like some help with a regular expression.

I have some files like this:

  • JWE-766.1.pdf
  • JWE-766.2.pdf
  • JWE-768.1.pdf
  • JWE-770.1.pdf

I would like a regex pattern to extract the number after 'JWE-'. i.e. 766.

Also, a regex expression to extract 1 and 2 from JWE-766.1.pdf and JWE-766.2.pdf respectively.

Any help would be hugely appreciated.

Thanks guys!

Flukey
  • 6,445
  • 3
  • 46
  • 71

4 Answers4

2
Pattern p = Pattern.compile("^JWE-([0-9]+)\\.([0-9]+)\\.pdf$");
Matcher m = p.matcher("your string here");

if (m.find()) {
    System.out.println(m.group(1)); //first number group
    System.out.println(m.group(2)); //second number group
}

Taken from here

Also, make sure to reuse the Pattern p object if you're looping through a series of strings

Community
  • 1
  • 1
sigint
  • 316
  • 2
  • 6
  • Hey. Thanks very much. Alas, that pattern has an error of "Invalid Escape Sequence". Hmmmn.....I shall find out why :-) – Flukey Jul 21 '10 at 16:22
  • It may be possible that Java is complaining over the `\\d` part. I changed it to use `[0-9]` instead. – sigint Jul 21 '10 at 16:27
  • `\\d` is correct. He was probably using your regex in its original form, before you doubled all the backslashes. – Alan Moore Jul 21 '10 at 20:34
1

Unless there is more variety to the pattern than this, I would just use substring manipulation in this case.

ie

string s = "JWE-766.1.pdf";
string firstNumber = s.substring( s.indexOf("-" +1), s.indexOf(".") );
string secondNumber = "JWE-766.1.pdf".substring( s.indexOf("." +1), s.lastIndexOf(".") ); 
µBio
  • 10,668
  • 6
  • 38
  • 56
  • Hello there. I would have done this approach, however, this is a problem because I also have files like: JWE-11.1.pdf – Flukey Jul 21 '10 at 16:18
  • @Jamie updated my answer to reflect your detail. To me, it still feels like substring is the better choice in this case, but to each his own :) – µBio Jul 21 '10 at 16:32
1

JWE-(\d+).(\d+).pdf

should do the trick.

of course when you are creating the string:

Pattern  p = Pattern.compile("JWE-(\\d+)\.(\\d+)\\.pdf");
Matcher m = p.matcher(s); // s contains your filename
if (m.matches()) { 
   String fullName = m.group(0);
   int firstIndex = m.group(1); // 766
   int secondIndex = m.group(2); // 1
}

Have fun

Elf King
  • 1,189
  • 5
  • 7
  • Thanks :-) I marked sigint's answer as the right one so he could get some reputation. I have upvoted yours though. Thanks again! :D – Flukey Jul 21 '10 at 16:27
  • :-) i voted for his answer too coz it appeared while i was finishing mine – Elf King Jul 21 '10 at 16:32
1

You can use parentheses for capturing groups, and then use Matcher.group(int) to retrieve them after matching.

Try the pattern "^JWE-(\d+)\.(\d?)\.pdf$" and I think group one should be the 766, and group 2 should be 1.

However, as stated above, if the file names are consistent in length, straight manipulation by index will be faster.

...one minute too slow. The Elf King is quick like the wind.

DVA
  • 331
  • 2
  • 13