4

Is there a regex that matches a string only when it starts on an odd or an even index? My use case is a hex string in which I want to replace certain "bytes".

Now, when trying to match 20 (space), 20 in "7209" would be matched as well even though it consists of the bytes 72 and 09. I am restricted to the regex implementation of Notepad++ in this case, so I'm not able to check the match index as e.g. in Java.

My sample input looks like:

324F8D8A20561205231920

I set up a testing page here, the regex should only match the first and the last occurence of 20, since the one in the middle starts on an odd index.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
RikuXan
  • 393
  • 5
  • 15

3 Answers3

4

You can use the following regex to match 20 at even positions inside a hex string:

20(?=(?:[\da-fA-F]{2})*$)

See demo

I assume the string has no spaces in this case.

In case you have spaces between the values (or any other symbols), this could be an alternative (with $1XX-like replacement string):

((?:.{2})*?)20

See another demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Why the disjunction at the end? Can't it be simpler, like so: `20(?=(?:[\da-fA-F]{2})*$)` – Bram Vanroy Jun 30 '15 at 21:11
  • @BramVanroy: It is true that `20(?=(?:[\da-fA-F]{2})+$|$)` is equal to `20(?=(?:[\da-fA-F]{2})*$)`. I did not notice the sample input string at the beginning, and was working on something different. That was just the remnants from the previous effort. Thank you for noticing. However, the overhead was just 1 additional single step. – Wiktor Stribiżew Jun 30 '15 at 21:14
1

This seems to work for evens:

rx <- "^(.{2})*(20)"

strings <- c("7209","2079","9720")

grepl(rx,strings) # [1] FALSE  TRUE  TRUE
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • It just matches everything for me, in Notepad++, as well as in Regex101 (https://regex101.com/r/eV7rR9/1) – RikuXan Jun 30 '15 at 20:19
  • Define odd and even -- are you starting your index at zero or one? – C8H10N4O2 Jun 30 '15 at 20:20
  • If the index starts at 0, I want my matches to start at an even index (0, 2, 4...), for a 1-index it would be odd. – RikuXan Jun 30 '15 at 20:22
  • Right, it matches the string because there's a 20 at position 20 (even index). Did you want to capture the whole string, or just the 20? – C8H10N4O2 Jun 30 '15 at 20:25
  • I want to capture just the 20 (and every even-indexed 20 afterwards) to be able to replace it afterwards – RikuXan Jun 30 '15 at 20:26
  • Now it only matches the last 20 in my test string, not the first one (see https://regex101.com/r/cN6fE2/1) – RikuXan Jun 30 '15 at 20:29
  • Right, you might need to repeat in a non-capturing group http://stackoverflow.com/questions/3512471/non-capturing-group – C8H10N4O2 Jun 30 '15 at 20:33
1

Not sure what Notepad++ uses for regex engine - it's been a while since I used it. This works in javascript...

/^(?:..)*?(20)/

...

/^     # start regex
(?:    # non capturing group
..     # any character (two times)
)*?    # close group, and repeat zero or more times, un-greedily
(20)   # capture `20` in group
/      # end regex
Billy Moon
  • 57,113
  • 24
  • 136
  • 237
  • Your regex just matches the last occurence of 20 for me, even in the JS engine (see https://regex101.com/r/rF7bS1/2) – RikuXan Jun 30 '15 at 20:28
  • It kinda worked by removing the ^, otherwise it would only match the first occurence, even with global on, however it seems to not be completely robust as it breaks here (https://regex101.com/r/hY8kK7/3) – RikuXan Jun 30 '15 at 20:37
  • It will break without an anchor, `^` or `$` is necessary, but without a variable-width look-behind, it makes no sense using `^` as reference point. I chose `$` since look-ahead is not fixed width in Notepad++ regex flavor. – Wiktor Stribiżew Jun 30 '15 at 20:43