2

We have some Groovy scripts that we run from Git Bash (MINGW64) in Windows. Some scripts prints the bullet character • (or similar). To make it work we set this variable:

export LC_ALL=en_US.UTF-8

But, for some people, this is not enough. Its console prints ΓÇó instead of .

Any idea about how to make it prints properly and why is printing that even after setting the LC_ALL variable?

Update

The key part is that the output from Groovy scripts is printing incorrectly, but there are no problems with the plain bash scripts.

Agorreca
  • 684
  • 16
  • 31
  • 1
    It could have something to do with their console or terminal character encoding settings. – Jonny Henly Oct 07 '20 at 23:24
  • 1
    `LC_ALL` affects your shell and (since it's exported) the programs running under it. I doesn't affect your terminal program, and it looks like your terminal is using [code page 437](https://en.wikipedia.org/wiki/Code_page_437) or something similar. – Gordon Davisson Oct 07 '20 at 23:48
  • If the `en_US.UTF-8` locale is not installed on the system, `export LC_ALL=en_US.UTF-8` will fail but this failure is not detectable by testing the return code. Use `locale -a` to query system available locales first. – Léa Gris Oct 08 '20 at 00:10
  • Which $TERM? Could you write down the codepoints of •? Some terminals doesn't handle combining characters, and/or non BMP (so characters above 0xFFFF): they say they are just UCS level 1. -- On my computer:` LC_ALL=en_US.UTF-8 echo "• $MSYSTEM $MINGW_CHOST"` gives me `• MINGW64 x86_64-w64-mingw32`. Could you give us an example? – Giacomo Catenazzi Oct 08 '20 at 07:14
  • I added an update: with plain bash scripts the bullet is printed fine. The problems comes with groovy script output. `• MINGW64 x86_64-w64-mingw32` – Agorreca Oct 08 '20 at 14:17
  • 1
    @Agorreca You could possibly piipe the groovy (gradle?) output to `| iconv -f 'UTF-8' -t "$(locale charmap)//TRANSLIT` – Léa Gris Oct 09 '20 at 20:59

1 Answers1

2

An example with querying the current characters mapping locale charmap used by the system locale, and filtering the output with recode to render it with compatible characters mapping:

#!/usr/bin/env sh

cat <<EOF | recode -qf "UTF-8...$(locale charmap)"
• These are
• UTF-8 bullets in source
• But it can gracefully degrade with recode
EOF

With a charmap=ISO-8859-1 it renders as:

o These are
o UTF-8 bullets in source
o But it can gracefully degrade with recode

Alternate method using iconv instead of recode and results may even be better.

#!/usr/bin/env sh

cat <<EOF | iconv -f 'UTF-8' -t "$(locale charmap)//TRANSLIT"
• These are
• UTF-8 bullets followed by a non-breaking space in source
• But it can gracefully degrade with iconv
• Europe's currency sign is € for Euro.
EOF

iconv output with an fr_FR.iso-8859-15@Euro locale:

o These are
o UTF-8 bullets followed by a non-breaking space in source
o But it can gracefully degrade with iconv
o Europe's currency sign is € for Euro.
Léa Gris
  • 17,497
  • 4
  • 32
  • 41
  • Great! Sorry for the delay, it happens on other users (not me) so I needed to be tested in other machines. Yes, you nailed it! Thanks. Do you know a way to set it as default conversion (or something like that?). We are using a lot of scripts and maybe add the iconv pipe in each, I don't know if it is possible. – Agorreca Oct 09 '20 at 21:32
  • 1
    @Agorreca if you encounter the problem only with Groovy, maybe you could use Java standard libraries to convert the characters mapping to that of system. See: [Encoding conversion in java](https://stackoverflow.com/a/229023/7939871) – Léa Gris Oct 09 '20 at 21:39