31

I have problem in reading text file with utf-8 encoding I'm using java with netbeans 7.2.1 platform

I already configured the java project to handle UTF-8 javaproject==>right click==>properties==>source==>UTF-8

but still get the unknown character output: ����� �������� ���� �

the code:

File fileDirs = new File("C:\\file.txt");

BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(fileDirs), "UTF-8"));

String str;

while ((str = in.readLine()) != null) {
    System.out.println(str);
}

any other ideas?

thanks

Subhrajyoti Majumder
  • 40,646
  • 13
  • 77
  • 103
Abrial
  • 421
  • 1
  • 5
  • 20
  • What is the encoding of `System.out`? What's your system encoding? – Mike Samuel Feb 17 '13 at 05:13
  • Are you sure, the input file is UTF-8 encoded? – Henry Feb 17 '13 at 06:49
  • 3
    thank you all for your comments. I found the solution to the problem.the text file was with ANSI encoding with arabic character. so to solve : BufferedReader in = new BufferedReader( new InputStreamReader(new FileInputStream(fileDirs), "windows-1256"));--thanks all – Abrial Feb 17 '13 at 12:40

5 Answers5

43

Use

    import java.io.BufferedReader;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.io.UnsupportedEncodingException;     
    public class test {
    public static void main(String[] args){

    try {
        File fileDir = new File("PATH_TO_FILE");

        BufferedReader in = new BufferedReader(
           new InputStreamReader(new FileInputStream(fileDir), "UTF-8"));

        String str;

        while ((str = in.readLine()) != null) {
            System.out.println(str);
        }

                in.close();
        } 
        catch (UnsupportedEncodingException e) 
        {
            System.out.println(e.getMessage());
        } 
        catch (IOException e) 
        {
            System.out.println(e.getMessage());
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }
    }
}

You need to put UTF-8 in quotes

dj_universe
  • 372
  • 5
  • 17
Shobhit Sharma
  • 1,599
  • 13
  • 14
  • 1
    Bad practice to put in.close before the catch. Should be in a finally block. Also can use multi catch format in Java 8 – tgkprog May 22 '17 at 19:25
12

You need to specify the encoding of the InputStreamReader using the Charset parameter.

Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This is work for me. i hope to help you.

jinkal
  • 1,622
  • 16
  • 21
10

You are reading the file right but the problem seems to be with the default encoding of System.out. Try this to print the UTF-8 string-

PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(str);
MoveFast
  • 3,011
  • 2
  • 27
  • 53
4

I ran into the same problem every time it finds a special character marks it as ��. to solve this, I tried using the encoding: ISO-8859-1

BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream("txtPath"),"ISO-8859-1"));

while ((line = br.readLine()) != null) {

}

I hope this can help anyone who sees this post.

4

Ok, I am definitively late to the party but if you are still looking for an optimal solution I would use the following ( for Java 8 )

    Charset inputCharset = Charset.forName("ISO-8859-1");
    Path pathToFile = ....
    try (BufferedReader br = Files.newBufferedReader( pathToFile, inputCharset )) {
        ...
     }
7dr3am7
  • 755
  • 1
  • 8
  • 21