3
FileInputStream fin = new FileInputStream("D:\\testout.txt");    
BufferedInputStream bin = new BufferedInputStream(fin);    
int i;    
while((i = bin.read())!=-1) {    
    System.out.print((char)i);    
}    

bin.close();    
fin.close();    

output: ÿþGreat

I have checked the file testout.txt, it contains only one word i.e, Great.

Assafs
  • 3,257
  • 4
  • 26
  • 39

4 Answers4

2

When you're using text, you should use a Reader. eg.

try(
    BufferedReader reader = Files.newBufferedReader(
        Paths.get("D:\\testout.txt"), 
        StandardCharsets.UTF_8)
    ){
    int i;    
    while((i = reader.read())!=-1) {    
        System.out.print((char)i);    
    }  
}
matt
  • 10,892
  • 3
  • 22
  • 34
1

That's most probably the Byte order mark, optional but allowed in files using UTF-8 character encoding. Some programs (e.g. Notepad) account for this possibility, some don't. Java by default doesn't strip them.

One utility to solve this is the BOMInputStream from Apache Commons IO.

Also, Notepad will write the byte order mark in the file when you save it as UTF-8.

Jiri Tousek
  • 12,211
  • 5
  • 29
  • 43
1

ÿþ is the byte order mark in UTF-16. You can convert your string to UTF-8 with java.io as explained here.

You may also refer to the answer for more detail.

Damith
  • 417
  • 1
  • 5
  • 15
0

Please use utf-8 Characters encoding for resolving this kind of issue. byte[] utf_8 = input.getBytes("UTF-8"); // convert unicode string to UTF-8 String test = new String(utf_8);