1

Am trying to read data from DB and assign those data to a DataObject. But one of the column in DB has a invalid char(Please see highlighted text area in the image https://i.stack.imgur.com/6bpx4.png ) which is not able to get parse in XML UTF-8, can any one please help me in resolving it. Thanks in advance

At present am using following code to remove invalid characters

    try {

        out = new StringBuffer(); // Used to hold the output.
    char current; // Used to reference the current character.
    if (in == null || ("".equals(in))) return ""; // vacancy test.
    for (int i = 0; i < in.length(); i++) {
        current = in.charAt(i); // NOTE: No IndexOutOfBoundsException caught here; it should not happen.
        if ((current == 0x9) ||  (current == 0xA) || (current == 0xD) || ((current >= 0x20) && (current <= 0xD7FF)) ||
            ((current >= 0xE000) && (current <= 0xFFFD)) || ((current >= 0x10000) && (current <= 0x10FFFF)))
        {  
            out.append(current);
        }


    }

    return out.toString();
  • Please elaborate as to what is considered a solution. – vtd-xml-author Jul 24 '16 at 07:24
  • It's unclear what you are saying is invalid and how. Is the field supposed to contain UTF-8 encoded text and the byte sequence it contains is invalid UTF-8? Or, is it simply that you want to put a sequence of characters in an XML document and it happens to contain characters that XML forbids? – Tom Blodget Sep 25 '16 at 19:56

1 Answers1

0

Finally I a solution to my problem..

Most likely you want to strip both non-printable and control characters. To do this, you would use the following regexp: "[^\x20-\x7E]" Or simply: "[^ -~]"

You can also refer to Replace non ASCII character from string for more information on this topic.

Community
  • 1
  • 1