2

Similar question but for UTF- is posted in stackoverflow. What is XML BOM and how do I detect it?

My problem is: Do I need to remove BOM from my target files or leave it (since it is required for UTF-16). Here is the brief of situation (I have posted t he detail in above question as part one answer):

  • Couple of XML files, each has two lines, first line if XML declaration and second line has all the other tags or XML content. (second lines is created from whole XML file except first line).
  • Now I am merging few such XML files in to one, two lines from source file become one single line in target file. Target file has couple of such single lines representing a compete XML source file.
  • When I read source files and write them to target file I am using following code in Java and I am able to create target files.

    Here is Java code for reading:

    Reader reader = new InputStreamReader(new FileInputStream(fileName),
    Charset.forName("UTF-16"));
    BufferedReader br = new BufferedReader(reader);
    String line = br.readLine();
    StringBuffer lineBuffer= new StringBuffer();
    lineBuffer.append(line);
    lineBuffer.append("\r\n");
    

Here is Java code for writing:

    Writer writer = new OutputStreamWriter(new FileOutputStream(
    targetFile ,true), "UTF-16");
    bw = new BufferedWriter(writer);
    bw.write(lineBuffer.toString());
  • The only issue is when I see target file with different editors different behavior is seen:
  • Notepad++ with UTF-16 support shows it perfectly fine.
  • Windows 7 Notepad shows it without any special character but I can see the target file has second line with slightly smaller font.
  • Notepad++ without support for UTF-16 shows a dot on top of < at the beginning of second line and other lines after it.
  • On some other editor a ? sign appears in front of line just before xml declaration tag starting second line (first line is file)

Source file:

<?xml version="1.0" encoding="UTF-16"?>
<OtherTags></OtherTags>

Target file:

<?xml version="1.0" encoding="UTF-16"?><OtherTags></OtherTags>
?<?xml version="1.0" encoding="UTF-16"?><OtherTags></OtherTags>
?<?xml version="1.0" encoding="UTF-16"?><OtherTags></OtherTags>
Community
  • 1
  • 1
ishu
  • 31
  • 3

0 Answers0