I have an application which (like many others) takes in user input, stores it in a database and then later processes it using (amongst other things) XML tools. The application takes in free text input and like many other developers I am very careful with escaping and quoting so it can handle input containing different types of whitespace, quote characters, reserved XML characters etc.
However, occasionally a user will manage to enter a string containing a vertical tab character (hex 0B) or a form feed (hex 0C). this cannot be processed by XML tools at all and causes the app to barf.
In my application it's quite important to preserve the original input during the 'round trip' process, so i'm loath to just strip out any characters I don't like, especially things like form feed which are still occasionally used in plain text files.
is there any accepted best practice or general strategy for handling these characters when XML processing is involved?