3

I have such broken XML:

<root>
   <Abc Dfg Xyz>data data data</Abc Dfg Xyz>
   <Kmn fsd>data data</Kmn fsd>
   <Aa bb/>
</root>    

How can I replace whitespaces with underscores in node names to fix xml format, but leave them in data using Regex.Replace?

I need such kind of a document:

<root>
   <Abc_Dfg_Xyz>data data data</Abc_Dfg_Xyz>
   <Kmn_fsd>data data</Kmn_fsd>
   <Aa_bb/>
</root>

Thanks in advance.

Pavlo Hermanov
  • 472
  • 5
  • 8

1 Answers1

3

It isn't a good idea to parse XML with regexes unless you understand your data. I would argue that in some limited cases it can be very helpful. @HighCore, see this answer to the same question.

We're not trying to understand all possible input in the world—we're trying to make something that works in a specific case. So, if you know that your input doesn't have < or > in the data, only in the node names, you can use a regex.

In C#, use a MatchEvaluator like so:

class MyReplacer {
   public string ReplaceSpaces(Match m)
   {
        return m.Value.Replace(" ", "_");
   }

void replacingMethod() {

   ...

   Regex re = new Regex("<.*>");

   MyReplacer r = new MyReplacer();
   // Assign the replace method to the MatchEvaluator delegate.
   MatchEvaluator myEvaluator = new MatchEvaluator(r.ReplaceSpaces);

   // Replace matched characters using the delegate method.
   sInput = re.Replace(sInput, myEvaluator);
}
Community
  • 1
  • 1
ddr
  • 201
  • 1
  • 7
  • +1 - In most scenarios XML and HTML shouldn't be parsed with Regex. However I agree that this case is specific enough to warrant using regex (assuming OP has given all the information). The string in OP's case is no longer XML, it just looks like XML. – keyboardP Jul 29 '13 at 23:08