2

I would like to construct a regex that matches two groups, with the second group consisting of a single character repeated the same number of times as the number of characters in the first group. Something like ^(\w+) (x{length of \1}) so, for example, hello xxxxx and foo xxx would match, but hello xxxxy and foo xxy would not. Is this possible?

The goal here is to match indentation in reStructuredText-style lists, where the second line in a list item should be indented to match the start of the text in the first line, excluding the variable-length numerical list marker. For example,

1. If the number is small,
   subsequent lines are typically
   indented three spaces.
2.  Although if the first line has
    multiple leading spaces then
    subsequent lines should reflect that.
11. And when the number starts to get
    bigger the indent will necessarily
    be bigger too.
clwainwright
  • 1,624
  • 17
  • 21
  • 1
    There is no construct in all of Regex land that can count. There are attributes you can glean, like odd or even. The only exception is a Perl code construct `(?{ code })` that can easily do it. –  Jun 15 '17 at 18:36
  • To my knowledge, you cannot match disparate groups based solely on length. The only way this would work is if the content of the groups was the same. Then you could use backreferences. – Shammel Lee Jun 15 '17 at 18:37
  • Hmm, that's too bad. Thanks for the info. – clwainwright Jun 15 '17 at 18:41
  • 1
    If you provide the language you are using it in, someone might come up with a solution. Actually, you may just do it in 2 steps with 1) `^\d+` and 2) buidling the second pattern dynamically. – Wiktor Stribiżew Jun 15 '17 at 18:51
  • What's the goal in just matching them? `where the second line in a list item should be indented to match the start of the text ` Are you trying to reformat? If so, it might be better to split each line into an array, then reconstruct it. Fwiw, the nearest you could get to counting (besides Perl) is a fixed number of group constructs and conditionals. `(\w)(\w)?(\w)?(\w)?(\w)?(\w)?\s+(?(1)\w)(?(2)\w)(?(3)\w)(?(4)\w)(?(5)\w)(?(6)\w)` which @melpomene suggests. –  Jun 16 '17 at 17:38

1 Answers1

2

You can do it if

  1. your regex engine supports conditional patterns and
  2. you're willing to accept a fixed upper bound on the number of repetitions.

In that case you can do something like this:

^(\w)?(\w)?(\w)?(\w)?(\w)? (?(1)x)(?(2)x)(?(3)x)(?(4)x)(?(5)x)

This example will match up to a length of 5.

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • Did OP really specify that the repeated character were only going to be `x`? – Olian04 Jun 16 '17 at 11:32
  • Note: I needed a digit pattern to match n and n or n-1 digits with n={1..5}. Example "0234-123" or "123-563". Based on above suggestion I found: `"(\d(\d)?(\d)?(\d)?(\d)?-\d(?(2)\d)?(?(3)\d)(?(4)\d)(?(5)\d))"` – eremmel Feb 12 '19 at 09:17