0

This is for a syntax checker. (yeah i know using regex is not ideal) The reader already detected that it is on the int|float|char|bool part and now it needs to check if the declaration and initialization is syntactically valid. The ff are sample of the str that my condition should pass.

a;
a, _b2;
a, _b2=0;
a=1, _b2=0;
a=1+1, _b2=a+1, c, d=555, e;
a=2.33;
a='a', b=3;
a="asb", b='3';
a=true, b=false, c="false";

Should not pass:

a= , b2 = 1;
a = ;
a = '23;
a = 50, b = a+1
a = a.23;

The condition ive made is not matching when it sees = Could you please help me correcting my condition

^(\s*[A-z_][A-z0-9]*\s*(=\s*0-9|=\s*"[^]*"|=\s*'[^]*')?\s*,)*\s*[A-z_][A-z0-9]*\s*(=\s*0-9|=\s*"[^]*"|=\s*'[^]*')?\s*;

UPDATE: considered floating values

UPDATE: made it a general regex that is applicable to int, float, char and boolean values

  • 1
    What parts are you trying to match/capture? – hwnd Aug 24 '13 at 18:29
  • 3
    Never use `A-z` in a character class. Ranges simply check code points (in that case ASCII codes), so `A-z` includes `[`, `]`, `\ `, `^`, `_` and `\``. – Martin Ender Aug 24 '13 at 18:40
  • 1
    C++ is kind of a horrifyingly difficult language to syntactically validate. At best, your validator will be forced to accept a superset of C++ syntax, leaving the borderline cases to be resolved by an actual compiler. (Oh, and macros will also make life really, really hard...) – nneonneo Aug 24 '13 at 18:48

2 Answers2

1

No regex in the world will be powerful enough to parse C++ declarations, for the very simple reason that the grammar is severely context-sensitive (and, in all likelihood, is actually undecidable).

For example, using the IsPrime template defined here, you can write a declaration like

int a = foo<IsPrime<234799>>::typen<1>();

which is syntactically valid if and only if 234799 is prime.

Consider using a different approach to validate C++ (e.g. g++ -fsyntax-only).

Community
  • 1
  • 1
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • I admit i shouldn't try this regex thing from the start, but for a system that is intended for c++ beginners having very fundamental library support and with main function only, let me finish this. So my own design (poor) is 1. check for syntax error by anticipation 2. create equivalent to js 3. run and give the beginner a trace of his variable values | condition evaluation (something like a tracer) –  Aug 24 '13 at 18:52
0

As nneonneo mentioned, regex is not suitable for the task, but if you want to match the sample strings you have, you can use this:

^(?:\s*[A-Za-z_][A-Za-z0-9]*\s*(?:=\s*(?:[A-Za-z0-9]+(?:[+\/*-][A-Za-z0-9]+)?|"[^"]*"|'[^']*'))?\s*,)*\s*[A-Za-z_][A-Za-z0-9]*\s*(?:=\s*(?:[A-Za-z0-9]+(?:[+\/*-][A-Za-z0-9]+)?|"[^"]*"|'[^']*'))?\s*;

Couple of things I changed from your regex:

  • Changed [A-z] to [A-Za-z].

  • Put the =\s* 'outside' because it was quite repetitive.

  • Added square brackets to the bare 0-9. I believe it was meant to be a character class.

  • Added letters to the character class [0-9].

  • Changed all the [^] to [^"] and [^'] where appropriate. I'm not too sure what you were trying, but just in case.

  • Added the basic integer operators and digits (and letters for variables) following it (?:[+/*-][A-Za-z0-9]+)?.

  • Changed the * in the first chacter class after = to + to prevent immediate , after =.

regex101 demo.

EDIT:

^(?:\s*[A-Za-z_][A-Za-z0-9_]*\s*(?:=\s*(?:[A-Za-z0-9_]+(?:\s*[+\/*-]\s*[A-Za-z0-9_]+)*|[‌​0-9]+(?:\.[0-9]+)?(?:\s*[+\/*-]\s*[0-9]+(?:\.[0-9]+)?)+|"[^"]*"|'[^']*'))?\s*,)*\s*[A-Z‌​a-z_][A-Za-z0-9_]*\s*(?:=\s*(?:[A-Za-z0-9_]+(?:\s*[+\/*-]\s*[A-Za-z0-9_]+)*|[0-9]+(?:\.[0-‌​9]+)?(?:\s*[+\/*-]\s*[0-9]+(?:\.[0-9]+)?)+|"[^"]*"|'[^']*'))?\s*;$

Some more whitespaces allowed and allowed underscore in variable names.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • this pass your regex wherein it should NOT: a12 = ,b2, c3 , d4; –  Aug 24 '13 at 18:58
  • 1
    @fireflieslive well that's hardly Jerry's fault without you posting some counter-examples in your question, is it? – Martin Ender Aug 24 '13 at 19:02
  • Oh i didnt mean too.. sorry men ^^ –  Aug 24 '13 at 19:07
  • 1
    @fireflieslive I'm not sure what you exactly want to reject, but I made one change to the regex which rejects the two strings you mentioned – Jerry Aug 24 '13 at 19:10
  • @jerry: one last thing, can you please do the pass a floating value in your regex? –  Aug 24 '13 at 20:59
  • 1
    @fireflieslive The following is a possible regex: [link](http://regex101.com/r/mN5oU1), but it's not very pretty. Does it fit what you were looking for? – Jerry Aug 25 '13 at 09:19
  • @jerry w/c part is not very pretty, Is there a bug ? Thanks it helps me a lot –  Aug 25 '13 at 09:26
  • 1
    @fireflieslive No, just remove the newline in the regex. I have the `x` mode activated just so it's easier to read the regex. Try: `^(?:\s*[A-Za-z_][A-Za-z0-9]*\s*(?:=\s*(?:[A-Za-z0-9]+(?:[+\/*-][A-Za-z0-9]+)?|[0-9]+(?:\.[0-9]+)?(?:[+\/*-][0-9]+(?:\.[0-9]+)?)?|"[^"]*"|'[^']*'))?\s*,)*\s*[A-Za-z_][A-Za-z0-9]*\s*(?:=\s*(?:[A-Za-z0-9]+(?:[+\/*-][A-Za-z0-9]+)?|[0-9]+(?:\.[0-9]+)?(?:[+\/*-][0-9]+(?:\.[0-9]+)?)?|"[^"]*"|'[^']*'))?\s*;` – Jerry Aug 25 '13 at 09:38
  • @jerry: if you're available, can you please give me a hand with http://stackoverflow.com/questions/18432433/using-regex-to-pass-syntax-valid-c-declaration-initialization-considering-the –  Aug 25 '13 at 18:47