I have a program that generates some data and saves it as an xml, unfortunately for my purposes I cant save it in the newer XML that allows for characters like 0x1f. As a result, I need to eliminate this character from my xml. All I have been able to find that seems to do this is this http://benjchristensen.com/2008/02/07/how-to-strip-invalid-xml-characters/ but I don't know java-script, and would like to be able to use a script that I am able to understand. I do know basic C#, but am not great in it. Anyway, what would be the easiest way to filter this character? I do think this is a good question for the online community anyway as finding a working method in C# from Google proves to be challenging.
Asked
Active
Viewed 5,777 times
2
-
I've never heard of a kind of XML which permits characters not permitted by XML. Can you provide a link? – John Saunders May 18 '12 at 01:32
-
Unless I am misreading it (sorry if I am, my English isn't too good) this: http://stackoverflow.com/questions/6693153/what-is-character-0x1f on the second answer seems to say that there is an xml 1.1 that allows it "it is indeed not a valid text character in XML 1.0 (but allowed in XML 1.1). In an UTF-8 input string, you can also safely replace the byte 0x1f with 0x09(Tab) to work around the problem. Alternatively, declare the document as XML 1.1 and use an XML 1.1 parser." @JohnSaunders – JosephG May 18 '12 at 02:04
-
how are you saving the xml now? – James Manning May 18 '12 at 03:17
-
The XML 1.1 specification may permit that, but you are assuming the existance of XML 1.1 _parsers_. If you don't have access to such a parser, then it doesn't matter what the spec allows. It doesn't seem to be widely implemented. – John Saunders May 18 '12 at 12:42
1 Answers
2
From this post: How can you strip non-ASCII characters from a string? (in C#)
Adjusting it for your case:
string s = File.ReadAllText(filepath);
s = Regex.Replace(s, @"[\u0000-\u001F]", string.Empty);
File.WriteAllText(newFilepath, s);
Then test the new file. Don't overwrite the old until you know if this works or not.

Community
- 1
- 1

Chuck Savage
- 11,775
- 6
- 49
- 69
-
Perfect! In case anyone else reads this having the same issue as I did, make sure to put "" around the file you are going to use in the (filepath) as it slipped by me and gave me a compiler error since I didn't do it – JosephG May 19 '12 at 01:42
-
This worked better for me: sprefs = System.Text.RegularExpressions.Regex.Replace(sprefs, @"[\u001F-\u001F]", string.Empty); The other one is more inclusive and made my xml content into one line. – Adam Bruss Dec 19 '16 at 16:18
-
@AdamBruss `[a-z]` is a whole range of characters as you saw. If you just want to replace the one character, what you did works, but is overly verbose :) Instead you could have used `@"\u001F"`. But even that is more complicated than needed. For such a simple case a normal string replace should work fine. No need for a regex. – Wodin Oct 25 '20 at 09:39