this shouldn't be fixed after the XML is generated, this is a bug in the code that generates the xml in the first place. fix the generator that generates the invalid XML, don't fix the invalid xml afterwards.
for the encoding specifications, check the XML specifications at https://www.w3.org/TR/xml/#intern-replacement , but note that many programming languages already have functions or libraries for this stuff, for example, to XML-encode a string in PHP, do htmlspecialchars ( $str, ENT_QUOTES | ENT_SUBSTITUTE | ENT_DISALLOWED | ENT_XML1, 'UTF-8', true );
and for for many other languages, there's libxml2, check http://xmlsoft.org/ (it has bindings for, among others, C, C++, C#, Python, Delphi/Pascal, Ruby, Perl, PHP)