The API docs say the following for readUTF8
Reads in a string from this file. The string has been encoded using a
modified UTF-8 format.
The first two bytes are read, starting from the current file pointer,
as if by readUnsignedShort. This value gives the number of following
bytes that are in the encoded string, not the length of the resulting
string. The following bytes are then interpreted as bytes encoding
characters in the modified UTF-8 format and are converted into
characters.
This method blocks until all the bytes are read, the end of the stream
is detected, or an exception is thrown.
Is your string formatted in this way?
This appears to explain your EOF exceptuon.
Your file is a text file so your actual problem is the decoding.
The simplest answer I know is:
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("jedis.txt"),"UTF-8"))){
String line = null;
while( (line = reader.readLine()) != null){
if(line.equals("Obi-wan")){
System.out.println("Yay, I found " + line +"!");
}
}
}catch(IOException e){
e.printStackTrace();
}
Or you can set the current system encoding with the system property file.encoding
to UTF-8.
java -Dfile.encoding=UTF-8 com.jediacademy.Runner arg1 arg2 ...
You may also set it as a system property at runtime with System.setProperty(...)
if you only need it for this specific file, but in a case like this I think I would prefer the OutputStreamWriter
.
By setting the system property you can use FileReader
and expect that it will use UTF-8 as the default encoding for your files. In this case for all the files that you read and write.
If you intend to detect decoding errors in your file you would be forced to use the InputStreamReader
approach and use the constructor that receives an decoder.
Somewhat like
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
BufeferedReader out = new BufferedReader(new InpuStreamReader(new FileInputStream("jedis.txt),decoder));
You may choose between actions IGNORE | REPLACE | REPORT
EDIT
If you insist in using RandomAccessFile
, you would need to know the exact offset of the line that you are intending to read. And not only that, in order to read with readUTF()
method, you should have written the file with writeUTF()
method. Because this method, as JavaDocs stated above, expects a specific formatting in which the first 2 unsigned bytes represent the length in bytes of the UTF-8 string.
As such, if you do:
try(RandomAccessFile raf = new RandomAccessFile("jedis.bin", "rw")){
raf.writeUTF("Luke\n"); //2 bytes for length + 5 bytes
raf.writeUTF("Obiwan\n"); //2 bytes for length + 7 bytes
raf.writeUTF("Yoda\n"); //2 bytes for lenght + 5 bytes
}catch(IOException e){
e.printStackTrace();
}
You should not have any problems reading back from this file using the method readUTF()
, as long as you can determine the offset of the given line that you want to read back.
If you'd open the file jedis.bin
you would notice it is a binary file, not a text file.
Now, I know that "Luke\n"
is 5 bytes in UTF-8 and "Obiwan\n"
is 7 bytes in UTF-8. And that the writeUTF()
method will insert 2 bytes in front of every one of these strings. Therefore, before "Yoda\n"
there are (5+2) + (7+2) = 16 bytes.
So, I could do something like this to reach the last line:
try (RandomAccessFile raf = new RandomAccessFile("jedis.bin", "r")) {
raf.seek(16);
String val = raf.readUTF();
System.out.println(val); //prints Yoda
} catch (IOException e) {
e.printStackTrace();
}
But this will not work if you wrote the file with a Writer
class because writers do not follow the formatting rules of the method writeUFT()
.
In a case like this, the best would be that your binary file would be formatted in such a way that all strings occupied the same amount of space (number of bytes, not number of characteres, because the number of bytes is variable in UTF-8 depending on the characters in your String), if not all the space is need it you pad it:
That way you could easily calculate the offset of a given line because they all would occupy the same amount of space.