-1

I have the following valid regex to match various excel cell/range patterns, of the form A1, A1:Z12, etc.

enter image description here

Is there a more compact way to do the second part of the match? Basically, the : <repeat> part I was hoping to be able to do it with something like:

  • ^ (<main_part> ':'<lookahead, keep if before an A-Z> ){1,2} $

Any way to do that pattern?

David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    See [Is it possible to define a pattern and reuse it to capture multiple groups?](https://stackoverflow.com/q/41878948/3832970) – Wiktor Stribiżew Nov 23 '21 at 07:49
  • 1
    @WiktorStribiżew perfect. I suppose the only downside of that is you have to use a capturing group at the start: `^([A-Z]{1,3}[0-9]{1,10})(:(?1))?$` – David542 Nov 23 '21 at 08:30

2 Answers2

1

A way without capture groups or lookarounds, use a word-boundary:

^(?:\b:?[A-Z]{1,3}[0-9]{1,10}){1,2}$

demo

The word-boundary can't succeed between the start of the string and a colon nor between a digit and a letter, but it does between a digit and a colon or between the start of the string and a letter.

Obviously, it's also possible to do it like that for the same kind of reasons:

^(?:[A-Z]{1,3}[0-9]{1,10}:?\b){1,2}$

(You win one step more with this one, YAY!)


test cases (first pattern):

  1. with :A2
    It fails because \b fails between the start of the string and a non-word character (the colon).

  2. with A2:
    It fails because there's no colon at the end of the sub-pattern (that is not repeated in this case).

  3. with A2:A2
    The pattern succeeds. \b succeeds because the first time it is between the start of the string and a letter (a word character), and the second time because it is between a digit (a word character too) and a colon (a non-word character).

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • oh that's such a wonderful solution thank you! Could you please explain a bit more on why the colon succeeds in one case but not another? Take, for example `:hi`, `hi:`, and `hi:hi`, how would it be different in those with the `\b` ? – David542 Nov 23 '21 at 21:05
  • 1
    @David542: `\b` is an assertion that succeeds between a word character (from `\w` ie `[letters+digits+underscore]`) and anything else including the limits of the string. This is the reason why the first time it fails (with a string starting with a colon) and the second time it succeeds only if there's a colon between the digit and the letter. – Casimir et Hippolyte Nov 23 '21 at 21:11
  • 1
    @David542: note that, for the second time, it isn't so different than testing if the colon is followed by a letter with a lookahead. – Casimir et Hippolyte Nov 23 '21 at 21:14
0

Here would be an example pattern you can use, note that AB:AB is not a valid range as described above so that has been modified as well to \d{1,10}:

enter image description here

And a better approach would be to use ?1 to recurse to the first pattern:

Note however with this approach we do need the extraneous capturing group at the beginning for this technique to work.

David542
  • 104,438
  • 178
  • 489
  • 842