0

I am using saxon in a python file with help of a subprocess :

subprocess.call(f"java -cp C:\saxon\SaxonHE10-6J\saxon-he-10.6.jar net.sf.saxon.Transform -t -s:{input} -xsl:{xslt} -o:{output}")

(Ref: Use saxon with python)

It is working fine, but not with input filenames including special utf-8 characters as "é" like in "illustré" as you can see in this error message :

Saxon-HE 10.6J from Saxonica Java version 17.0.1 Source file U:\collections\17_01_2019_illustré\illus_edited (1).xml does not exist

How can I fix this?

silfer1200
  • 21
  • 1
  • Is that on Windows? If you use `java` call directly in a cmd window with that file name, does Saxon find the file or does it give the same error? Are you sure it is the "é" and not the space between "edited" and the "(1)"? – Martin Honnen Jan 25 '22 at 18:53
  • I think the exception comes from https://saxonica.plan.io/projects/saxon/repository/he/revisions/master/entry/latest10/hej/net/sf/saxon/trans/CommandLineOptions.java#L736 so it looks as if, under Java, `new File("U:\collections\17_01_2019_illustré\illus_edited (1).xml")` doesn't find the file. To check whether that is caused by the interaction of Python or Python doing a `subprocess.call` to Java I would first check whether Saxon itself, when run with Java from the command line, finds the file as you expect. – Martin Honnen Jan 25 '22 at 19:08
  • Use of non-ASCII characters in Windows filenames is fraught with problems, see for example https://learn.microsoft.com/en-us/windows/win32/intl/character-sets-used-in-file-names My advice would be to avoid it. But if you have to make it work, try to isolate whether the problem is occurring at the Java-Windows interface, or at the Python-Java interface. it could be either. – Michael Kay Jan 25 '22 at 19:55
  • Thanks for your inputs. I tried what you suggested. From the command line, the error also occurs. When I replace "é" with "e" in my path, the error does not occur, saxon works fine and, by the way, spaces in path are no problem. Any other idea? I have many many files to saxonize and I do not intend to change all my filenames. – silfer1200 Jan 26 '22 at 08:47
  • I did some tests with file names and directory names on Windows 10 using Java 11, Saxon 10.6 and names containing accented characters but at least from the command line Saxon finds all files fine on my system. – Martin Honnen Jan 26 '22 at 10:26
  • Thanks for the trial, @MartinHonnen. Accordingly, I installed Saxon on another PC (Saxon-HE 10.6J from Saxonica - Java version 15.0.2 - Windows 10) and the problem does not occur anymore. Thanks to all for the comments and advices. – silfer1200 Jan 26 '22 at 19:07

0 Answers0