regex to match two different groups with the same length

Question

I would like to construct a regex that matches two groups, with the second group consisting of a single character repeated the same number of times as the number of characters in the first group. Something like ^(\w+) (x{length of \1}) so, for example, hello xxxxx and foo xxx would match, but hello xxxxy and foo xxy would not. Is this possible?

The goal here is to match indentation in reStructuredText-style lists, where the second line in a list item should be indented to match the start of the text in the first line, excluding the variable-length numerical list marker. For example,

1. If the number is small,
   subsequent lines are typically
   indented three spaces.
2.  Although if the first line has
    multiple leading spaces then
    subsequent lines should reflect that.
11. And when the number starts to get
    bigger the indent will necessarily
    be bigger too.

There is no construct in all of Regex land that can count. There are attributes you can glean, like odd or even. The only exception is a Perl code construct `(?{ code })` that can easily do it. — , Jun 15 '17 at 18:36
To my knowledge, you cannot match disparate groups based solely on length. The only way this would work is if the content of the groups was the same. Then you could use backreferences. — Shammel Lee, Jun 15 '17 at 18:37
If you provide the language you are using it in, someone might come up with a solution. Actually, you may just do it in 2 steps with 1) `^\d+` and 2) buidling the second pattern dynamically. — Wiktor Stribiżew, Jun 15 '17 at 18:51
What's the goal in just matching them? `where the second line in a list item should be indented to match the start of the text ` Are you trying to reformat? If so, it might be better to split each line into an array, then reconstruct it. Fwiw, the nearest you could get to counting (besides Perl) is a fixed number of group constructs and conditionals. `(\w)(\w)?(\w)?(\w)?(\w)?(\w)?\s+(?(1)\w)(?(2)\w)(?(3)\w)(?(4)\w)(?(5)\w)(?(6)\w)` which @melpomene suggests. — , Jun 16 '17 at 17:38

score 2 · Accepted Answer · answered Jun 15 '17 at 19:13

2

You can do it if

your regex engine supports conditional patterns and
you're willing to accept a fixed upper bound on the number of repetitions.

In that case you can do something like this:

^(\w)?(\w)?(\w)?(\w)?(\w)? (?(1)x)(?(2)x)(?(3)x)(?(4)x)(?(5)x)

This example will match up to a length of 5.

answered Jun 15 '17 at 19:13

melpomene

84,125
8
85
148

Did OP really specify that the repeated character were only going to be `x`? – Olian04 Jun 16 '17 at 11:32
Note: I needed a digit pattern to match n and n or n-1 digits with n={1..5}. Example "0234-123" or "123-563". Based on above suggestion I found: `"(\d(\d)?(\d)?(\d)?(\d)?-\d(?(2)\d)?(?(3)\d)(?(4)\d)(?(5)\d))"` – eremmel Feb 12 '19 at 09:17

regex to match two different groups with the same length

1 Answers1

Linked