I am trying to figure out a way where I can find all the invalid characters in an XML. According to W3 recommendation these are the valid characters in an XML:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Converting it to decimal:
9
10
13
32-55295
57344-65533
65536-1114111
are the valid xml characters.
I am trying to search in notepad++ using the appropriate regular expression for the invalid characters.
A snippet from my XML:
<custom-attribute attribute-id="isContendFeed">fal  se</custom-attribute>
<custom-attribute attribute-id="pageNoFollow">fal  se</custom-attribute>
<custom-attribute attribute-id="pageNoIndex">fal se</custom-attribute>
<custom-attribute attribute-id="rrRecommendable">false</custom-attribute>
From the above example I want that my regular expression finds 
and 
for me because these are not allowed in an XML.
I am not able to construct the regular expression for this.
The regular expression I made for the numeric ranges:
32-55295 : (3[2-9]|[4-9][0-9]|[1-9][0-9]{2,3}|[1-4][0-9]{4}|5[0-4][0-9]{3}|55[01][0-9]{2}|552[0-8][0-9]|5529[0-5])
57344-65533 : (5734[4-9]|573[5-9][0-9]|57[4-9][0-9]{2}|5[89][0-9]{3}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-3])
65536-1114111 : (6(5(5(3[6-9]|[4-9][0-9])|[6-9][0-9]{2})|[6-9][0-9]{3})|[7-9][0-9]{4}|[1-9][0-9]{5}|1(0[0-9]{5}|1(0[0-9]{4}|1([0-3][0-9]{3}|4(0[0-9]{2}|1(0[0-9]|1[01])))))))
These regular expression are working if used separately but I am not able to make the complete regex.
Is there any other way other than the regular expression by which I can find the invalid characters? If not, please help me in constructing the regular expression which can find invalid characters present in my XML.