14

I have a regex expression that traverses a string and pulls out 40 values, it looks sort if like the query below, but much larger and more complicated

est(.*)/test>test>(.*)<test><test>(.*)test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test>

My question is how do I use these expressions with the replace command when the number exceeds 9. It seems as if whenever I use \10 it returns the value for \1 and then appends a 0 to the end.

Any help would be much appreciated thanks :)

Also I am using UEStudio, but if a different program does it better then no biggie :)

Mofi
  • 46,139
  • 17
  • 80
  • 143
Dustin Gamester
  • 792
  • 3
  • 8
  • 24

5 Answers5

16

As pointed out by psycho brm: Use $10 instead of \10 I am using notepad++ and it works beautifull.

Community
  • 1
  • 1
Sandokas
  • 325
  • 2
  • 8
6

Most of the simple Regex engines used by editors aren't equipped to handle more than 10 matching groups; it doesn't seem like UltraEdit can. I just tried Notepad++ and it won't even match a regex with 10 groups.

Your best bet, I think, is to write something fast in a quick language with a decent regex parser. but that wouldn't answer the question as asked

Here's something in Python:

import re

pattern = re.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
with open('input.txt', 'r') as f:
    for line in f:
        m = pattern.match(line)
        print m.groups()

Note that Python allows backreferences such as \20: in order to have a backreference to group 2 followed by a literal 0, you need to use \g<2>0, which is unambiguous.

Edit: Most flavors of regex, and editors which include a regex engine, should follow the replace syntax as follows:

abcdefghijklmnop
search: (.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(?<name>.)(.)
note:    1  2  3  4  5  6  7  8  9  10 11 12 13
value:   a  b  c  d  e  f  g  h  i  j  k  l  m
replace result:
    \11      k1      i.e.: match 1, then the character "1"
    ${12}    l       most should support this
    ${name}  l       few support named references, but use them where you can.

Named references are usually only possible in very specific flavor of regex libraries, test your tool to know for sure.

Tony Chiboucas
  • 5,505
  • 1
  • 29
  • 37
Chris B.
  • 85,731
  • 25
  • 98
  • 139
  • 2
    "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." - Jamie Zawinski the above quote never seemed so true :( thanks for the help :) – Dustin Gamester Jul 22 '10 at 14:26
3

put a $ in front of the double digit subgroup: e.g. \1\2\3\4\5\6\7\8\9$10 It worked for me.

pepita96
  • 31
  • 1
2

Try using named groups; so instead of the tenth:

(.*)

use:

(?<group10>.*)

and then use the following replace string:

${group10}

(That's of course in the absence of a better solution using looping, and remember that there might be different regex syntax flavours depending on your environment.)

Andrew Jens
  • 1,032
  • 15
  • 16
  • This worked for me when using regex matching in nginx, which doesn't seem to like matching more than 9 groups. Same issue the OP has where $10 is interpreted as $1 + 0. – theChumpus Aug 19 '16 at 10:33
1

If you cannot handle more than 9 subgroups why not initially match groups of 9 and then loop and apply regexes to those matches?

i.e. first match (<test.*/test>)+ and then for each subgroup match on <test(.*)/test>.

Aaron Silverman
  • 22,070
  • 21
  • 83
  • 103