1

I have this xml which could have nested double quotes (not escaped) inside attributes:

<test>
    <tag1 att1="This has "nested double quotes"">
        <tag2 att2="This also has a nested " double quotes"></tag2>
    </tag1>
</test>

I need to find a regex which will select all the nested double quotes, in this case

  • "nested double quotes"
  • nested " double

and replace them with the &quot; character. The final xml should be like the following:

<test>
    <tag1 att1="This has &quot;nested double quotes&quot;">
        <tag2 att2="This also has a nested &quot; double quotes"></tag2>
    </tag1>
</test>

Is it possible to achieve this using regex?

revy
  • 3,945
  • 7
  • 40
  • 85
  • 1
    It is _possible_ to achieve this using regex? Maybe, but now would be a good time to learn how to use an XML parser! It will give me massively more reliable results than regex would. – Tim Biegeleisen Aug 30 '19 at 10:20
  • @TimBiegeleisen Unfortunately I have no control over the xml source so I have to fix the problem my self. I agree that it should be properly escaped from the beginning, but can't do anything about that – revy Aug 30 '19 at 10:26

0 Answers0