2

So I'm reading a file and I need to count the number of logical operators in that file, from suggestions on here I've tried using Regular expressions, but the one that I am using :

Regex reg = new Regex(@"/and|x?or|&&|[<>!=]=|[<>&!]|\|{1,2}/gi");

returns everything that matches, for example it returns any variable with an "or" in it, or if I have a "<=" operator it counts it as two separate operators ("<" and "=" separately).

Should I even use regex at this point because it doesn't seem like it would help my situation.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Commongrate
  • 123
  • 1
  • 12
  • 4
    I would ditch the regex and use the Microsoft Compiler Services (Rosyln) to parse and locate the SyntaxKind and SyntaxTokens you are interested in. It is not trivial to learn but I doubt you'll ever get it perfect with regex. [Here's an article](https://medium.com/@CPP_Coder/introduction-to-roslyn-and-its-use-in-program-development-bce2043fc45d) – Crowcoder Dec 09 '18 at 19:11
  • It seems your regex counts `<=` once. the only thing your regex does not currently handle is boundary words for example it should not match `andd` but it should match `and0x3`. you can do this for that part of regex `(?<=\d|\b)(and|x?or)(?=\d|\b)` – M.kazem Akhgary Dec 09 '18 at 19:12
  • 1
    @M.kazemAkhgary could you also exclude hits in comments? – Crowcoder Dec 09 '18 at 19:18
  • 1
    @Crowcoder Or inside literal strings. – Uwe Keim Dec 09 '18 at 19:29
  • seems hard to achieve with regex. especially if you want to consider `/* ... */` type of comments. – M.kazem Akhgary Dec 09 '18 at 19:32

1 Answers1

0

In order to match <= and not < is to use conditional expression:

(<=|<|>=|>)

which will first try to match <= first and if it succeeds, it won't check for <. Same applies to >= and =.

To match or, but not in words, you can use lookarounds to ensure that it is on it's own (adding this pattern to above):

(<=|<|>=|>|(?<=[^a-zA-Z])or(?=[^a-zA-Z]))

Try demo.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • 4
    What if he has these characters (like `<`, `=`, etc.) inside literal strings or comments? To me, using Regex to parse source code is the same [XY problem](http://xyproblem.info) approach as using [Regex to parse HTML](https://stackoverflow.com/a/1732454/107625). – Uwe Keim Dec 09 '18 at 19:26