1

When trying to write some UTF8 data to a file, I end up with some garbage in the file. The code is as follows

public static boolean saveToFile(StringBuffer buffer,
                                   String fileName,
                                   ArrayList exceptionList,
                                   String className)
  {
    log.debug("In saveToFile for file [" + fileName + "]");

                RandomAccessFile raf = null;
                File file = new File(fileName);
                File backupFile = new File(fileName+"_bck");

                try
                {
                    if (file.exists())
                    {
                            if (backupFile.exists())
                            {
                            backupFile.delete();
                            }
                            file.renameTo(backupFile);
                    }
                    raf = new RandomAccessFile(file, "rw");
                    raf.writeBytes(buffer.toString());
                    raf.close();

The output of buffer.toString() is

<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>αβγδεζη

The data in the file however is

<?xml version="1.0" encoding="UTF-8"?>
<ivr>
<version>1.1</version>
<templateName>▒▒▒▒▒▒▒</templateName>

How can I make sure that data i nthe file itself is UTF8

Manuj
  • 296
  • 1
  • 4
  • 13
  • Have you tried writeUTF rather than writeBytes? – JamesB Jul 24 '14 at 11:51
  • I have modified the post to include the inclusion of buffer etc. Basically buffer is passed in to the function. I have already printed the buffer.toString() output in the post above. If I didnt clarify your question then please let me know – Manuj Jul 24 '14 at 11:51
  • I havent tried writeUTF as yet, though it is on my todo list. The concern I have regarding writeUTF is that (I read somewhere) writeUTF first writes the number of characters to file and then the characters. Plus the other concern I have is whether writeUTF is UTF8 or UTF16. – Manuj Jul 24 '14 at 11:58

3 Answers3

0

I'm not surpised you get garbage:

 raf.writeBytes(buffer.toString())

The documentation for RandomAccessFile.writeBytes(String) says (emphasis added):

Writes the string to the file as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits.

In a few circumstances, that operation will result in a correctly encoded file. But in most it won't. That writeBytes() method is a foolish design by the Java developers. You need to correctly encode your text as bytes in UTF-8, and then write those bytes.

Do you really need to operate on the file as a random access file. If not, just manipulate it with a Writer wrapping an OutputStream.

You could use Charset.encode(CharBuffer) to produce a ByteBuffer holding the encoded bytes, then write those bytes to the file:

 raf.write(StandardCharsets.UTF_8.encode(buffer).array());
Raedwald
  • 46,613
  • 43
  • 151
  • 237
  • Yes while discussing offline in the office, even we were coming to the same conclusion that the code needs to be redone to start using something like OutputStream etc. – Manuj Jul 24 '14 at 12:14
  • `StandardCharsets.UTF_8.encode(buffer).array()` seemed when I tried it to produce a Bytes stream of the same length as twice the number of characters in the original, even if the UTF-8 representation didn't contain that many characters, with the rest being nulls. Rather use `buffer.toByteArray(StandardCharsets.UTF_8)` – David Fraser Mar 07 '23 at 09:12
0

The Javadoc for RandomAccessFile states that for writeBytes()

Writes the string to the file as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits. The write starts at the current position of the file pointer.

Assuming that discarding parts of your String isn't what you want, you should be using writeUtf():

Writes a string to the file using modified UTF-8 encoding in a machine-independent manner.

Andrew Stubbs
  • 4,322
  • 3
  • 29
  • 48
  • I will try writeUTF() though just for experiment as I am now more inclined to use OutputStream etc – Manuj Jul 24 '14 at 12:15
0
String txt = buffer.toString();
raf.write(txt.getBytes(StandardCharsets.UTF_8));
Dietrich
  • 681
  • 5
  • 18
  • Can you add an explanation what this code does, and how it answers the question? Answers with a little explanation are usually more helpful than just code. – Ljm Dullaart Jun 14 '21 at 11:44