I am new to c#. I have a question about parsing a string. If i have a file that contains dome lines such as PC: SWITCH_A == ON
or a string like PC: defined(SWITCH_B) && SWITCH_C == OFF
. All the operators(==, &&, defined) are string here and all the switch names(SWITCH_A) and their values are identifiers(OFF). How do i parse these kind of string? Do i first have to tokenize them split them by new lines or white spaces and then make an abstract syntax tree for parsing them? Also do i need to store all the identifiers in a dictionary first? I have no idea about parsing can anyone help? an tell me with an example how to do it what should be the methods and classes that should be included? Thanks.

- 73
- 1
- 8
-
If the lines are reasonably consistent, the simplest approach might be to read each line in, and then perform a string.Split(' '). This will return you an array of strings, which you can then parse accordingly. – Hywel Rees Jun 17 '16 at 09:17
-
See my SO answer on how to build a recursive-descent parser by hand. http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 This is pretty easy for expressions. – Ira Baxter Jun 17 '16 at 11:16
1 Answers
Unfortunately, Yes. You have to tokenize them if the syntax that you are parsing is something custom and not a standard syntax where a compiler already exists for parsing the source.
You could take advantage of Expression Trees. They are there in the .NET Framework for building and evaluating dynamic languages.
To start parsing the syntax you have to have a grammar document that describes all the possible cases of the syntax in each line. After that, you can start parsing the lines and building your expression tree.
Parsing any source code typically goes a character at a time since each character might change the entire semantics of the piece that is being parsed.
So, i suggest you start with a grammar document for the syntax that you have and then start writing your parser.
Make sure that there isn't anything already out there for the syntax you are trying to parse as these kind of projects tend to be error-prone and time consuming
Now since your high-level grammar is
Expression ::= Identifier | IntegerValue | BooleanExpression
Identifier
and IntegerValue
are constant literals in the source, so you need to start looking for a BooleanExpression
.
To find a BooleanExpression
you need to look for either BooleanBinaryExpression
, BooleanUnaryExpression
, TrueExpression
or FalseExpression
.
You can detect a BooleanBinaryExpression
by look for the &&
or ==
operators and then taking the left and right operands.
To detect a BooleanUnaryExpression
you need to look for the word defined
and then parse the identifier in the parantheses.
And so on...
Notice that your grammar supports recursion in the syntax, look at the definition of the AndExpression
or EqualsExpression
, they point back to Expression
AndExpression ::= Expression '&&' Expression
EqualsExpression ::= Expression '==' Expression
You got a bunch of methods in the String Class in the .NET Framework to assist you in detecting and parsing your grammar.
Another alternative is that you can look for a parser generator that targets c#. For example, see ANTLR

- 1
- 1

- 1,606
- 11
- 20
-
Thanks, but i am not clear about where to start from that is my main concern for now – user5440565 Jun 17 '16 at 09:19
-
Start by finding a document that describes all the possible cases for syntax in each line in the case that you don't know. If you know, then you can start writing your parser. – Timothy Ghanem Jun 17 '16 at 09:20
-
I already have a grammer for this kind of a thing. i.e. ` *Expression ::= Identifier | IntegerValue | BooleanExpression * Identifier ::=
* IntegerValue ::= – user5440565 Jun 17 '16 at 09:26*BooleanExpression ::= BooleanBinaryExpression | BooleanUnaryExpression | TrueExpression | FalseExpression * BooleanBinaryExpression ::= AndExpression | EqualsExpression * AndExpression ::= Expression '&&' Expression * EqualsExpression ::= Expression '==' Expression * BooleanUnaryExpression ::= DefinedExpression * DefinedExpression ::= 'defined' '(' Identifier ')'` -
-
@user5440565 You'll have problems with your current grammar as it seems to have indirect left recursion: `Expression -> BooleanExpression -> BooleanBinaryExpression -> AndExpression -> Expression`. Moreover, I strongly suggest to use a tool like ANTLR instead of implementing this by yourself, unless for academic purpose. You will save yourself a serious headache. – Patrice Gahide Jun 17 '16 at 12:40
-
And now I see that Timothy more or less pointed that out already ;) – Patrice Gahide Jun 17 '16 at 12:44
-
yes, this is academic and i hvae to do this, i cannot avoid it... thanks for the suggestion @PatriceGahide – user5440565 Jun 17 '16 at 13:26
-