0

So I am writing a program that tokenizes a C program, and so far I have managed to handle most cases that I have been able to think of with one exception. If I have a variable declared as such:

char a[] = "Hello World";

The line gets separated into the tokens char a [ ] = "Hello World" and ; as I ensure that there are always spaces where necessary before I split the string based on spaces. Is there a way to use regexes to split the string only if it has seen an even (0 is even) amount of quotations marks since the last split so that the tokens are char a [ ] = "Hello World" ;?

Ohunter
  • 343
  • 1
  • 2
  • 13
  • What is the actual raw string you are trying to split here? The one line of code you have showed us looks like C. – Tim Biegeleisen Jan 19 '20 at 03:16
  • What is the approach you are taking to parsing the code? I would imagine it's very difficult to parse C with just regex. – gph Jan 19 '20 at 04:01
  • Something like `"(?:[^"\\]|\\.)*"` will match a C string literal. – ikegami Jan 19 '20 at 04:09
  • @Tim Biegeleisen, They are trying to parse that line of C code :) – ikegami Jan 19 '20 at 04:10
  • @gph, Indeed. A single regex could be used to validate, but not extract. Regex could be used to tokenize, (using a regex for each token, roughly), but that's just the first step of parsing. – ikegami Jan 19 '20 at 04:10
  • To the best of my knowledge I never mentioned parsing, but yes that is ultimately the next step for the project. As @Nick mentioned that regex is a step in the right direction, but not a complete answer – Ohunter Jan 19 '20 at 05:58
  • I have written several lexical analyzers for compilers and not one began with splitting the input on spaces. Is it too late to rethink this? – Booboo Jan 19 '20 at 12:44

0 Answers0