I am really new to regex and I was following other StackOverflow answers to make sed command to remove invalid XML characters.
sed -ie 's/[^\u0009\r\n\u0020-\uD7FF\uE000-\uFFFD\ud800\udc00-\udbff\udfff]//g' myfile.xml
When I run this, it looks like it deletes a bunch of alphabets,,, For example, if it is company, it deletes o,m,p,a,y,etc. Especially lower cases.
There is something wrong with my regex OR maybe it doesn't think it as regex. Would you please help me? Thank you.