I'm running a script on WSL Debian which fetches Windows files from a locally mounted share drive. Issue is that the file names are wrongly encoded, even-though #encoding
returns #<Encoding:UTF-8>
. Example:
"J\u00E9r\u00E9my".encoding # #<Encoding:UTF-8>
\u00E9
is the Unicode character for é
, so I assume that the encoding is Unicode
I've tried several encoding combination from related questions (Convert a unicode string to characters in Ruby?, How to convert a string to UTF8 in Ruby), but none of the fit my needs.
I've also tried different "magic comments" encoding: <ENCODING>
, without satisfying result.
What's your methodology to identify and fix encoding issues ?
Edit1: Stefan asked for codepoints:
"J\u00E9r\u00E9my".each_codepoint.to_a
# [74, 233, 114, 233, 109, 121]
and Encoding.default_external
Encoding.default_external
# #<Encoding:US_ASCII>
Which surprises me, as I've the magic comment # encoding: utf-8
at the top of my file
Edit2: explicitely setting default_internal
and default_external
encoding to Encoding::UTF_8
fixes the problem
# encoding: utf-8
Encoding.default_internal = Encoding::UTF_8
Encoding.default_external = Encoding::UTF_8
Though I'd like to go further and actually understand why this is required