1

For content packages exported in Sling / AEM using Jackrabbit FileVault there are some filename escaping rules which specify that characters not allowed for file names are encoded using URL encoding, but the rules do not specify which charset is used for that. It seems at least on my systems using MacOS and Linux ISO-8859-1 or possibly Windows-1252 or similar is used. Is that always the case? Or is it just the rightfully dreaded system default charset? Thanks!

Dr. Hans-Peter Störr
  • 25,298
  • 30
  • 102
  • 139

1 Answers1

0

As it turns out, Vault FS url-escapes only the characters <>|"/?: which are not valid as file names in Windows. All other characters are not escaped. Thus you can easily run into trouble when you use characters outside of US-ASCII because of ZIP programs not supporting Unicode and filesystems not supporting Unicode. Thus, it is probably not wise to use non US-ASCII characters in JCR nodenames for things like folder names, pages, files that appear as a filename in a content package. :-(

On the decoding side: a percent-encoded sequence is decoded directly into a Java char - thus percent encoded unicode characters below 256 would work, but there is no way to percent-encode characters outside that range.

Dr. Hans-Peter Störr
  • 25,298
  • 30
  • 102
  • 139