C# - Best way to parse xml like text and perform action

Question

I have a small text string with xml like tags inside it:

<sub>A</sub>B<sup>C</sup>

I need to parse this text and perform actions based on the tags. So the above text will look like _AB^C in my target application (MS Excel -- Excel can parse and format this string if I paste it but not if I just enter it in a cell).

What is the best way to parse this type of tag based text in terms of performance. The formatting code is going to be called very frequently and I want to minimize the overhead as much as possible. I can think of the following options:

Parse it character by character using the Indexer keeping track of when the tag started/ended
Use Regular Expressions
Load it into some XML/HTML DOM Parser and iterate through the nodes

Which one do you think will have the least performance impact? Any other way I can get the task done?

score 4 · Accepted Answer · edited May 23 '17 at 09:58

4

Do not re-invent the wheel, and especially do not use regular expressions.

Use an existing XML parser.
You should use LINQ to XML.

If you implement that and find it too slow, you can switch to an XmlReader, which will be extremely fast but annoying to work with.
Remember; premature optimization is the root of all evil.

edited May 23 '17 at 09:58

Community

1
1

answered Jan 24 '11 at 04:14

SLaks

868,454
176
1,908
1,964

I really wish I could give you more than a +1. – Alastair Pitts Jan 24 '11 at 04:15

C# - Best way to parse xml like text and perform action

1 Answers1