I have a repository where several files have been checked in from Windows, and have unicode characters in the FILENAMES. For example AgêBean.java, GûBean.java, LêgbaBean.java, and XêviosoBean.java. When these files are checked out on a CentOS 7 system, the bytes comprising filenames are interpreted as ISO-8859-1. This breaks stuff like the java compiler. For example, Java won’t compile the above files, because the unicode identifiers for the class, i.e. “AgêBean”, does not match the ISO-8859-1 filename, which the compiler sees as “AgêBean.java” The short, ugly workaround is to rename the files, but if they are checked in, then the same problems appear on the Windows side.
So what are some better solutions? I can imagine a few, but I don’t know how to do any of them, and google is not yet being helpful:
A) Re-configuring my CentOS filesystem so that all filenames are UTF-8 (or UTF-16) encoded.
B) Configuring git on Linux to understand that the filenames in the repository are encoded UTF-8, but the local system is ISO-8859-1, so all filenames need to be converted when checked in or out.
C) Configuring java (and terminals, and editors) on Linux to understand that the filenames under this directory are UTF-8 encoded, so each is decoded correctly.
I’d be happiest with solution “A”, but so far I have not found how to do that. I hope it’s not compiled-into the Cent0S 7 (or RHEL 8) kernel.