1

I have the following aaaa_bb_cc string to match and written a regex pattern like

\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
ukanth
  • 2,718
  • 5
  • 28
  • 38

4 Answers4

3

You don't need to escape the underscores:

\w{4}+_\w{2}_\w{2}

And you can collapse the last two parts, if you don't capture them anyway:

\w{4}+(?:_\w{2}){2}

Doesn't get shorter, though.

(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))

Joey
  • 344,408
  • 85
  • 689
  • 683
2

Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.

Igor Artamonov
  • 35,450
  • 10
  • 82
  • 113
2

Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead

[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}
YOU
  • 120,166
  • 34
  • 186
  • 219
  • Missed that one. However, is `\w` in Java really only `[a-zA-Z0-9]`? In .NET at least both `\d` and `\w` match pretty much anything counting as decimal number or letter. – Joey May 06 '10 at 12:28
2

I sometimes do what I call "meta-regexing" as follows:

    String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
    System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"

Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".

If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).

Other examples

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • @UK: Essentially the idea is that you don't need to have the actual regex explicitly written out. If it makes it more readable/maintainable to derive the regex programmatically, then go ahead – polygenelubricants May 06 '10 at 12:16