0

I use Apache POI to read from an Excel file to get the paths for docx, doc, xls, and xlsx files, decrypt the file content and build a new path to read the data back.

The problem now is when path has french character, like following:

/Valérie/CASES.doxcs
is = new FileInputStream(path);

This line will have the following exception:

(No such file or directory)
at java.io.FileInputStream.open(Native Method)

It works well for other path, is that mean Apache POI does not support non-English character or is something else wrong? Anyway to fix this?

printemp
  • 869
  • 1
  • 10
  • 33
  • 3
    `FileInputStream` isn't part of Apache POI - it's just in the Java core libraries. POI is irrelevant to that. I suggest you create a short but complete program which *just* tries to open a `FileInputStream` on the appropriate file. – Jon Skeet Jul 27 '15 at 08:00
  • @JonSkeet thx for remind, you are right – printemp Jul 27 '15 at 08:08

2 Answers2

1

As this is an operating system matter, you could convert paths:

static String toFileName(String name) {
    return java.text.Normalizer.normalize(name, Form.NFKD)
            .replaceAll("\\P{ASCII}", ""); //.replaceAll("[\"/\\]", "_");
}

The above would convert é to e and so on, by splitting an accented letter into a basic letter plus accents. There might be better transliterations. And consider Cyrillic and other scripts.

A nicer solution would be to move to a Linux system with UTF-8. You might still want to normalize accent usage to one unique form, say the shortest char sequence:

static String toFileName(String name) {
    return java.text.Normalizer.normalize(name, Form.NFKC);
}
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Hi, i change the workspace text file encoding into utf-8, that's how I get the path with special characters. I tried with your code, it still does not work, is there sth wrong with configuration? thx – printemp Jul 27 '15 at 09:19
  • On a "western" system `Valérie` should pose no problem. Write a separate tiny application to test what's wrong, doing everything step-wise. When Valérie was converted wrongly somewhere, then one would have such a thing. Check for wrong conversions: `new String(bytes), String.getBytes(), FileReader/FileWriter, InputStreamReader(stream), OutputStreamWriter(stream)`. – Joop Eggen Jul 27 '15 at 10:02
0

How can I open files containing accents in Java?. tried everything on this link. For most situation, the configuration in Eclipse window->preference->general->workspace set to utf-8, and project-> running as configuration vm Arguments:Dfile.encoding=UTF-8 should already solve the problem.

But if you JDK is not SUN and you are in linux system. You'd better echo $LANG make sure it's UTF-8 and then compile and run the java src code through linux command line.Problem solved. Links for java code run in linux: http://www.sergiy.ca/how-to-compile-and-launch-java-code-from-command-line/

Community
  • 1
  • 1
printemp
  • 869
  • 1
  • 10
  • 33