4

I'm trying to read some French character from the file but some symbols comes if letter contains à é è. Can anyone guide me how to get actual character of the file. Here is my main method

public static void main(String args[]) throws IOException

    {
    char current,org;

    //String strPath = "C:/Documents and Settings/tidh/Desktop/BB/hhItem01_2.txt";

    String strPath = "C:/Documents and Settings/tidh/Desktop/hhItem01_1.txt";
    InputStream fis;

    fis = new BufferedInputStream(new FileInputStream(strPath));

    while (fis.available() > 0) {
    current= (char) fis.read(); // to read character
                                                            // from file
                            int ascii = (int) current; // to get ascii for the
                                                        // character
                            org = (char) (ascii);
                            System.out.println(org);
    }
Vikash Kumar
  • 395
  • 5
  • 17

4 Answers4

2

You're trying to read UTF-8 character actually using ASCII. Here's an example of how to implement your feature:

public class Test {
    private static final FILE_PATH = "c:\\temp\\test.txt";
    public static void main(String[] args){

    try {
        File fileDir = new File(FILE_PATH);

        BufferedReader in = new BufferedReader(
           new InputStreamReader(
                      new FileInputStream(fileDir), "UTF8"));

        String str;

        while ((str = in.readLine()) != null) {
            System.out.println(str);
        }

                in.close();
        } 
        catch (UnsupportedEncodingException e) 
        {
            System.out.println(e.getMessage());
        } 
        catch (IOException e) 
        {
            System.out.println(e.getMessage());
        }
        catch (Exception e)
        {
            System.out.println(e.getMessage());
        }
    }
}

Reference: How to read UTF-8 encoded data from a file

Positive Navid
  • 2,481
  • 2
  • 27
  • 41
dawidklos
  • 902
  • 1
  • 9
  • 32
  • I tried the way you told but not fixed my issue. Word in my file(français) Output I got(fran?ais) – Vikash Kumar Sep 25 '15 at 15:09
  • 1
    Java encodes as UTF-8; console decodes as IBM850. java -Dfile.encoding=UTF-8 – dawidklos Sep 25 '15 at 15:17
  • 1
    for more information see http://stackoverflow.com/questions/24803733/default-character-encoding-for-java-console-output – dawidklos Sep 25 '15 at 15:17
  • 1
    Your issue with output is a mirror of your issue with input: use the correct encoding. See these answers: http://stackoverflow.com/a/11868911/1172714 and http://stackoverflow.com/a/17551962/1172714 – dsh Sep 25 '15 at 15:19
  • @VikashKumar if you are using command line to view the result, it's inevitable that you see "fran?ais ", as it doesn't usually support UTF-8. Try testing it with an IDE (such as Eclipse) or by writing it in a file: https://www.mkyong.com/java/how-to-write-utf-8-encoded-data-into-a-file-java/ – Positive Navid Oct 01 '16 at 16:45
1

You can download one jar file for Apache Commons IO and try to implement it by reading each line rather than reading char by char.

 List<String> lines = IOUtils.readLines(fis, "UTF8");

        for (String line: lines) {
          dbhelper.addDataRecord(line + ",'" + strCompCode + "'"); 
        }
Pranesh Sahu
  • 595
  • 5
  • 26
0

The following assumes the text is in Windows Latin-1, but I have added alternatively UTF-8.

private static final String FILE_PATH = "c:\\temp\\test.txt";

Path path = Paths.get(FILE_PATH);
//Charset charset = StandardCharset.ISO_8859_1;
//Charset charset = StandardCharset.UTF_8;
Charset charset = Charset.forName("Windows-1252");
try (BufferedReader in = Files.newBufferedReader(path, charset)) {
    String line;
    while ((line = in.readLine()) != null) {
        System.out.println(line);
    }
}

The String line will contain the text in Unicode. It now depends whether System.out can represent that Unicode in your system encoding, using a conversion from Unicode.

System.out.println("My encoding is: " + System.getProperty("file.encoding"));

However if you picked the correct encoding, at most one ? per special char. If you seem more per special char, use UTF-8 - a multi-byte encoding.

Pick a Unicode capable font too for the console.

A check for the having got é is:

String e = "\u00e9";
String s = new String(Files.readAllBytes(path), charset);
System.out.println("Contains e´ : " + s.contains(e));

After comment:

Better use Files.newBufferedReader (which I corrected above) as that can do the following.

try (BufferedReader in = new BufferedReader(
         new InputStreamReader(
             new FileInputStream(file), charset))) {

This buffers for faster reading, and the InputStreamReader uses a binary data InputStream plus an charset to convert it to (Unicode) of a Reader.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks Joop. Really appreciating the things you told.Can you please help me out to implement the encoding stuff in the code that i have attached means how can I these stuffs used with BufferedInputStream. – Vikash Kumar Sep 25 '15 at 18:34
  • I corrected my original answer, there was a copy error, `Files.newBufferedReader` was intended. The class Files has many nice things, like reading a list of lines: `List`. – Joop Eggen Sep 25 '15 at 19:44
0

the specific encoding for french give by IBM is CP1252 (preferred because run on all operating system).

Regards,

A frenchy guy

fabien t
  • 358
  • 1
  • 9