3

The following regex suits my pattern. But, I am wondering if there is a way to shorten it. I can't use \w because I want only case insensitive English alphabets. Because the pattern repeats I am wondering if I can group it if that is possible.

([A-Za-z]{5}\.[A-Za-z]{3}\.[A-Za-z]{3}\.[A-Za-z]{3}\.[0-9]{3}\.[0-9]{2})\.([0-9]{8}\-[0-9]{6})\.csv
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
would_like_to_be_anon
  • 1,639
  • 2
  • 29
  • 47
  • This is a JavaScript regex question I presume? It's pretty general, but that is still helpful to know. – J0e3gan Feb 13 '16 at 18:36

2 Answers2

4

You can shorten a bit to this:

([A-Za-z]{5}(\.[A-Za-z]{3}){3}\.[0-9]{3}\.[0-9]{2})\.([0-9]{8}-[0-9]{6})\.csv
anubhava
  • 761,203
  • 64
  • 569
  • 643
3

\d instead of [0-9] is an obvious way to shorten it a bit:

([A-Za-z]{5}\.[A-Za-z]{3}\.[A-Za-z]{3}\.[A-Za-z]{3}\.\d{3}\.\d{2})\.(\d{8}\-\d{6})\.csv

Next, consolidate the repeated pattern that @anubhava pointed out:

([A-Za-z]{5}\.([A-Za-z]{3}\.){3}\d{3}\.\d{2})\.(\d{8}\-\d{6})\.csv

Setting case insensitivity at the outset will shorten the regex a bit further...

(?i)([a-z]{5}\.([a-z]{3}\.){3}\d{3}\.\d{2})\.(\d{8}\-\d{6})\.csv

...while also matching .CSV (i.e. versus just .csv) files, which you may not have considered but typically would be valid.

Finally, there are 4 parentheses that may be extraneous and dispensable:

(?i)[a-z]{5}\.([a-z]{3}\.){3}\d{3}\.\d{2}\.\d{8}\-\d{6}\.csv
J0e3gan
  • 8,740
  • 10
  • 53
  • 80
  • 3
    `\d` is an obvious thought at first but it doesn't exactly equals `[0-9]`. It includes more characters: http://stackoverflow.com/a/16621778/72478 – cherouvim Feb 22 '14 at 06:46
  • @cherouvim: I don't typically run regexes against Unicode input strings, but that is a fair point if Unicode input strings are a concern. – J0e3gan Feb 22 '14 at 06:52
  • 2
    @cherouvim: It depends on the language. C# matches Unicode digits by default, but Java, JavaScript don't. From Java 7, the `(?U)` flag can be used to make `\d` matches Unicode digits, but that is after using flag. – nhahtdh Feb 22 '14 at 06:55
  • These are great points regarding Unicode (and performance) considerations. Thanks! – J0e3gan Feb 22 '14 at 06:58