1

TLDR: What I am trying to accomplish is match a pattern where a string does not exist.

How can I match a string where a word does not appear in the string? I am writing the code in go, so I do not have access to any of the lookaround options in this case, and have to do this more in pure regex without relying on external libs. I believe I can accomplish this easily with a negative lookahead, but I am restricted from using that. Also, client side assertions (i.e. checking if first pattern matches and then checking the groups) isnt really an option here also.

Here is the sample data. In this data, I would like to capture app instances of new MyClass which does not have new AnotherClass as its argument.

# case 0
return new MyClass(constructor, options, ...);

# case 1
var mc = new MyClass();
# case 2
var mc = new MyClass(new AnotherClass());
# case 3
var amc = new MyClass(options, new AnotherClass(), ...);

# case 4
MyClass mc = new MyClass(
  something
);

# case 8
MyClass mc = new MyClass(
  something, new Render()
);

# case 5
MyClass mc = new MyClass(
  new AnotherClass()
);

# case 6
MyClass mc = new MyClass(
  options, new AnotherClass()
);

# case 7
var amc = new MyClass(new AnotherClass(), options, ...);

So in this case, I want to match case # 0, 1, 8 and 4. So far I can accomplish this using a group with the following regex (?s)(?:new\sMyClass\([^;]*?new\sAnotherClass\(\).*?\))|(new\sMyClass\([^;]*?\)). Do notice that I am using the (?s) flag as the data can have newlines.

The regex that I have now is: regex 101

The example is not quite what I am after as it is returning both desired and undesired results in the match. What I am trying to do is to match the required new MyClass instance where new AnotherClass is not an argument without using groups. With groups, I am getting both matches, and I am trying to narrow it down to only one match; i.e. only blue matches in regex 101.

Happy to answer any clarifying question if the question is not clear.

securisec
  • 3,435
  • 6
  • 36
  • 63
  • "*client side assertions isnt really an option here also.*" - why not? What exactly are you limited by? And what is your actual goal? It seems you are trying to parse code with regex, which is never a good idea. – Bergi Nov 03 '21 at 04:27
  • @Bergi What I am trying to do is search code with regex, not necessarily parse it. The actual goal is as stated. At the moment, I am able to find all matches I am looking for based on the regex101 link in the OP, but as groups. What I am trying to accomplish is to get the same outcome of the group result, but as a match instead. The matches currently include both what I am trying to capture and what I am trying to avoid. – securisec Nov 03 '21 at 04:32
  • But why are you trying to do this with regex? Why can you not access the group result? Is this just an arbitrary exercise for academic curiosity? Can you share your go code? – Bergi Nov 03 '21 at 04:41
  • No, I cannot share the go code. I believe I have provided sufficient context, examples and examples of what I have tried along with urls to the effort. Why the constraints exist is not the issue I am trying to resolve. The question is not really about why I cannot do xyz with go, but more how can I capture xyx with regex under certain constraints. – securisec Nov 03 '21 at 04:45
  • In that case, the answer will be that it is not possible under those constraints. (Unless you multiply out the state machine of the negative lookahead, of course) – Bergi Nov 03 '21 at 04:49
  • I can fully accept that it might not be possible. But would you share a minor example of `multiply out the state machine` with a foo bar example? It will give me a change to think about the problem from that solutions perspective then. – securisec Nov 03 '21 at 04:52
  • `(?!ab).*` is equivalent to `|[^a].*|a|a[^b].*` for example. Gets more complicated quickly. – Bergi Nov 03 '21 at 05:06
  • I believe your only option it to first match a regex to determine if `"AnotherClass"` is present, something like `new +MyClass\([^)]*\bAnotherClass\b[^)]*\)`. If that match fails apply a second regex to match what you want returned. – Cary Swoveland Nov 03 '21 at 05:56

3 Answers3

1

The regexp library used in Go language has limited regex features, so the closest solution here is to match the new MyClass(...) with no new AnotherClass substring in between parentheses:

(?s)new\sMyClass\((?:[^;n]|n(?:n|e(?:n|w(?:n|\s(?:n|An(?:ew\sAn)*(?:n|o(?:n|t(?:n|h(?:n|e(?:n|r(?:n|C(?:n|l(?:n|a(?:n|sn))))))))|e(?:n|w(?:n|\sn)))))))*(?:[^;en]|e(?:[^;nw]|w(?:[^;\sn]|\s(?:[^;An]|A(?:[^;n]|n(?:ew\sAn)*(?:[^;eno]|o(?:[^;nt]|t(?:[^;hn]|h(?:[^;en]|e(?:[^;nr]|r(?:[^;Cn]|C(?:[^;ln]|l(?:[^;an]|a(?:[^;ns]|s[^;ns]))))))))|e(?:[^;nw]|w(?:[^;\sn]|\s(?:[^;An]|A[^;n]))))))))))*(?:n(?:n|e(?:n|w(?:n|\s(?:n|An(?:ew\sAn)*(?:n|o(?:n|t(?:n|h(?:n|e(?:n|r(?:n|C(?:n|l(?:n|a(?:n|sn))))))))|e(?:n|w(?:n|\sn)))))))*(?:e(?:(?:w(?:\sA?)?)?|w\sAn(?:ew\sAn)*(?:o(?:t?|th(?:e?|er(?:C?|Cl(?:a?|as))))|e(?:w(?:\sA?)?)?)?))?)?\)

See the regex demo.

The details how to obain this regex are located in the Regex: match everything but specific pattern post.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Using a negative lookahead with a tempered dot is how I would generally do this. In the absence of lookarounds, you could phrase your logic as not matching any MyClass which has an instance of AnotherClass() in it, but also still matching MyClass:

matches:
new MyClass\(.*?\)

does NOT match:
new MyClass\([^)]*AnotherClass\(\).*\)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I am not sure I follow. In the first regex, it will match everything, including instances which has AnotherClass as an argument.. In the second regex, it does not really address the need here. I am not sure if I am missing something very obvious here. – securisec Nov 03 '21 at 03:41
  • @securisec If you are doing this from a programming language, you may assert that the first regex matches but the second one doesn't. If you need just a single regex, I'm afraid you may need to use lookaheads. – Tim Biegeleisen Nov 03 '21 at 03:42
  • Yes you are right, and although it is in code, the logic to do a test and group match is not quite possible. I will edit my original ask to also share that constrained. Hence the hunt for a complicated pattern. – securisec Nov 03 '21 at 03:44
0

Match when there are no brackets inside the brackets:

new\sMyClass\([^();]*\)
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This is awesome! But would it be possible to do so without negating the brackets? I will update my post to include that as there may be other classes being passed as the argument to the MyClass constructor. – securisec Nov 03 '21 at 03:51
  • In the regex101 that I shared, that does into account that `something` might be `new YetAnotherClass` as an example – securisec Nov 03 '21 at 03:52
  • Maybe group the "passes" and "fails" separately. Is `AnotherClass` special? eg should `var mc = new MyClass(new YetAnotherClass());` match? – Bohemian Nov 03 '21 at 03:53
  • Grouping is what I have as my pattern at the moment, but I am trying to avoid groups all together and make it more of a single pattern. To answer your question, yes, it should match if the string is `var mc = new MyClass(new YetAnotherClass(...)` as `AnotherClass` and `YetAnotherClass` are different objects. Here is my working regex which shows the matches I am trying to get in group 1 https://regex101.com/r/cIIU5C/1/. But I am trying to get matches instead of groups if that makes sense. – securisec Nov 03 '21 at 03:55
  • Thank you. This has the negative lookahead which I cannot use as golang stdlib does not support lookarounds as mentioned in the op. – securisec Nov 03 '21 at 04:02
  • 1
    @Behmian, I am not clear about what you mean. Golang doesnt what? Support negative lookaheads? This I know and hence the constraint. I did reflect that in the regex101 tester also where I am picking golang as the lang. – securisec Nov 03 '21 at 04:14