How to determine which character is used as decimal separator (radix point) or thousand separator, under current locale?

Question

Comming from this answer to Format currency in Bash, I wonder for ways to determine which characters are used as numeric separators.

There are lot of issue regarding locales and number formating, for sample:

printf '%.5f\n' $(bc -l <<<'4*a(1)')
3.14159

LANG=de_DE printf '%.5f\n' $(bc -l <<<'4*a(1)')
bash: printf: 3.14159265358979323844: invalid number
3,00000

binary calculator bc seem not handling locale correctly...

Under mentioned answer, searching for decimal separator (or radix character), I've used this:

int2amount() {
    local TIMEFORMAT=%U _decsep
    read _decsep < <(eval 'time true' 2>&1)
    _decsep=${_decsep//[0-9]}
    ...
}

This work fine:

pi() { local TIMEFORMAT=%U _decsep;read _decsep < <(eval 'time true' 2>&1);_decsep=${_decsep//[0-9]};
       local pi=$(bc -l <<<'4*a(1)')
       printf '%.5f\n' ${pi/./$_decsep}
}

pi
3.14159
LANG=de_DE pi
3,14159

But as thousand separator is a lot easier to find:

printf -v ts "%'d" 1111 ; ts=${ts//1}

There is no fork, so system footprint is very light.

So I could imagine at begin of source file, something like:

numericSeparators() {
    local TIMEFORMAT=%U
    read NUM_DEC_SEP < <(eval 'time true' 2>&1)
    NUM_DEC_SEP=${NUM_DEC_SEP//[0-9]}
    printf -v NUM_THO_SEP "%'d" 1111
    NUM_THO_SEP=${NUM_THO_SEP//1}
}
numericSeparators
declare -r NUM_THO_SEP NUM_DEC_SEP
...

But I think <(eval 'time true' 2>&1) heavy for the goal. I'm searching for a lighter and/or cleaner way for determine them (even both decimal and thousand separators).

Thanks to dan's answer, my functions would become simplier and quicker!

pi() {
    local _decsep pi=$(bc -l <<<'4*a(1)')
    printf -v _decsep %.1f 1
    printf '%.5f\n' ${pi/./${_decsep:1:1}}
}

pi
3.14159
LANG=de_DE.UTF-8 pi
3,14159

numericSeparators() {
    local numtest
    printf -v numtest "%'.1f" 1111
    NUM_THO_SEP=${numtest:1:1}
    NUM_THO_SEP=${NUM_THO_SEP/1}
    NUM_DEC_SEP=${numtest: -2:1}
}
numericSeparators

for loctest in   C   en_US.UTF-8   de_DE.UTF-8   ;do
    LANG=$loctest numericSeparators
    printf '  %-12s decimal: \47%s\47 thousand: \47%s\47\n' \
            "$loctest"  "$NUM_DEC_SEP"  "$NUM_THO_SEP"
done

  C            decimal: '.' thousand: ''
  en_US.UTF-8  decimal: '.' thousand: ','
  de_DE.UTF-8  decimal: ',' thousand: '.'

@JamesBrown Oh yes! (forgot this!) ... But `locale` is not builtin, so system footprint won't be better... — F. Hauri - Give Up GitHub, Jun 29 '22 at 08:50
Try `LANG=de_DE printf '%.5f\n' $(LANG=de_DE bc -l <<<'4*a(1)')`, maybe? — Renaud Pacalet, Jun 29 '22 at 09:34
@RenaudPacalet On my system, this render: Error: `bash: printf: 3.14159265358979323844: invalid number`, then `3,00000`. — F. Hauri - Give Up GitHub, Jun 29 '22 at 09:40
My guess is that `bc` ignores the locale while `bash` does not. — Renaud Pacalet, Jun 29 '22 at 09:41
@RenaudPacalet I've wrote: *`for sample: ... binary calculator bc seem not handling locale correctly...`* — F. Hauri - Give Up GitHub, Jun 29 '22 at 09:43
Oh, yes, sorry, I missed that, sorry. But even if it was don't you think that `$(LANG=de_DE bc -l <<<'4*a(1)')` would be needed instead of just `$(bc -l <<<'4*a(1)')`? — Renaud Pacalet, Jun 29 '22 at 09:45
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/246015/discussion-between-f-hauri-and-renaud-pacalet). — F. Hauri - Give Up GitHub, Jun 29 '22 at 09:45
What's the issue with `printf -v ds %.1f 1; ds=${ds//[10]}` to get decimal separator? — dan, Jun 29 '22 at 10:10
@dan I'ts too simple for me! Bravo, If you post this as an aswer, I will accept your answer! — F. Hauri - Give Up GitHub, Jun 29 '22 at 10:18
@dan please comme to [discussion](https://chat.stackoverflow.com/rooms/246015/discussion-between-f-hauri-and-renaud-pacalet) back, for tests... — F. Hauri - Give Up GitHub, Jun 29 '22 at 10:19

score 3 · Accepted Answer · answered Jun 29 '22 at 11:08

3

You can get the locale's radix character (decimal separator) with:

printf -v ds '%#.1f' 1
ds=${ds//[0-9]}

And the thousands grouping separator, with:

printf -v ts "%'d" 1111
ts=${ts//1}

Some locales (eg. C) have no thousands separator, in which case $ts is empty. Conversely, if the radix character is not defined by the locale, POSIX (printf(3)) says it should default to .. The # flag guarantees that it will be printed.

answered Jun 29 '22 at 11:08

dan

4,846
6
15

1

Then in one way: `LANG=$loczest printf -v var %\'5.1f 1111;thsnd=${var:1:1} radix=${var: -2:1} ;thsnd=${thsnd/1};declare -p thsnd radix` – F. Hauri - Give Up GitHub Jun 29 '22 at 11:47

score 0 · Answer 2 · answered Jul 24 '23 at 21:44

In the vast majority of cases, you don't even have to know which locale setting you're in to properly decode any value, regardless of your own locale's settings.

Because you simply can't have 2 radix points (RP), of any base, one can just use gsub() or similar quick counting tools to figure out which one of , vs. . has multiple copies.

If both does, that's likely problematic input to begin with.
If one of each exists, right side one has to be the RP
When there's only one and ambiguous, then consider :

If there's 0 or more digits that's anything but 3 digits to right of that character, it couldn't possibly be the thousands sep

And more likely than not, thousands sep is bookended on both sides by digits, but having a leading edge or trailing edge radix point isn't all that uncommon

Only the 4 to 6 digit numbers (including below radix point, assuming its' still ambiguous) would require extra context to properly decode.

How to determine which character is used as decimal separator (radix point) or thousand separator, under current locale?

2 Answers2

Linked