3

I have a pcre regex string and I am trying to convert to re2. Here is the pcre and an the string to match on.

\%(?!$|\W)

It matches only on the % and in case there is ! or non-word char doesn't

%252525253E%252553Csvg%25252525252525252Fonload%252525252525252525252525252525252525252525252525252525253Dalert(document.domain)%252525252

Result: % % % %

My best conversion is this:

\%[^!$|\W]

Result: %2 %3 %3 %2 %3 %3

This however matches on the first digit and I do not want that, I'd like it to behave exactly as the pcre version. This is where I test:

regex-golang DOT appspot DOT com/assets/html/index.html

regex101 DOT com

Any help will be appreciated.

Ivan Ivanov
  • 41
  • 1
  • 3
  • Try `%(\w)` and replace with `\1` if you need to remove `%` – Wiktor Stribiżew Aug 30 '17 at 17:52
  • Are you trying to obtain [this](https://regex101.com/r/VKhSk7/1)? – Wiktor Stribiżew Aug 30 '17 at 19:05
  • Thanks but that will not work for me. I am trying to achieve exactly what the pcre version does. What you showed me will not work if I have > or < or ! or * , also \% is literal for % if not the interpreted will be confused. The string above is just an example. we can have something like 'buddy%'>0 and 'friend%'!=1 then we do not this to much. – Ivan Ivanov Aug 30 '17 at 19:53
  • `%` is not a special regex char, and it should not be escaped. What I showed will always work since all regex flavors have support for capturing groups, but your question is unclear: what are the requirements? Match any `%` before a word char and remove it? If yes, my suggestion above is the solution. Else, please explain your requirements in the question body. `re2` does not support lookarounds, so the only viable work around is using the capturing groups. – Wiktor Stribiżew Aug 30 '17 at 19:56
  • I see you are not suggesting replace \W with \1 in the regex. I do not want to remove % i just need to match on it as I explained above with negative match on special chars, again your solution will not work, this is not a negative match, or also called negative lookahead in pcre (?!…) – Ivan Ivanov Aug 30 '17 at 20:01
  • And this is why I asked the question here so i can have some help – Ivan Ivanov Aug 30 '17 at 20:04

1 Answers1

1

You could try something like this:

(\%)(?:[^!$|\W])

Since golang doesn't have negative lookahead (at least I think so) you could use a non-capturing group instead.So in this example you will need to use the first capturing group (e.g.matches[1] and not matches[0]) https://regex101.com/r/THTWwB/2

EDIT: A more detailed example in golang to help you understand the above regex is the following:

package main

import (
    "fmt"
    "regexp"
)

func main() {
    r := regexp.MustCompile(`(\%)(?:[^!$|\W])`)
    m := r.FindAllStringSubmatch(`%252525253E%252553Csvg%25252525252525252Fonload%252525252525252525252525252525252525252525252525252525253Dalert(document.domain)%252525252`,-1)
    fmt.Printf("%#v\n",m )
}

In this example you can access your % by using the first capturing group.So for example m[0][0] will be %2 but m[0][1] will be just % (1st capturing group).Note that the first index is the index of the matches.So for the first match is stored in m[0][] , the second in m[1][] etc

G.Margaritis
  • 182
  • 5
  • This will give me what I already have as \%[^!$|\W] it will get the digits as well I need to exclude the digits but if I try like this: (\%)(?:[^!$|\W|\d]) just doesn't match anymore the %. – Ivan Ivanov Aug 31 '17 at 15:56
  • @Ivan Ivanov As I said there are no lookaheads in re2 and they will probably never be implemented.So you cannot achieve exactly the same result wiwth PCRE.But with the solution I provided you can extract your matches.Take a look at a similar discussion here:https://stackoverflow.com/questions/30305542/using-positive-lookahead-regex-with-re2 – G.Margaritis Aug 31 '17 at 20:45
  • Thanks Margaritis. In that case this will work. That explains better. – Ivan Ivanov Sep 01 '17 at 20:39