1

Comming from this answer to Format currency in Bash, I wonder for ways to determine which characters are used as numeric separators.

There are lot of issue regarding locales and number formating, for sample:

printf '%.5f\n' $(bc -l <<<'4*a(1)')
3.14159

LANG=de_DE printf '%.5f\n' $(bc -l <<<'4*a(1)')
bash: printf: 3.14159265358979323844: invalid number
3,00000

binary calculator bc seem not handling locale correctly...

Under mentioned answer, searching for decimal separator (or radix character), I've used this:

int2amount() {
    local TIMEFORMAT=%U _decsep
    read _decsep < <(eval 'time true' 2>&1)
    _decsep=${_decsep//[0-9]}
    ...
}

This work fine:

pi() { local TIMEFORMAT=%U _decsep;read _decsep < <(eval 'time true' 2>&1);_decsep=${_decsep//[0-9]};
       local pi=$(bc -l <<<'4*a(1)')
       printf '%.5f\n' ${pi/./$_decsep}
}

pi
3.14159
LANG=de_DE pi
3,14159

But as thousand separator is a lot easier to find:

printf -v ts "%'d" 1111 ; ts=${ts//1}

There is no fork, so system footprint is very light.

So I could imagine at begin of source file, something like:

numericSeparators() {
    local TIMEFORMAT=%U
    read NUM_DEC_SEP < <(eval 'time true' 2>&1)
    NUM_DEC_SEP=${NUM_DEC_SEP//[0-9]}
    printf -v NUM_THO_SEP "%'d" 1111
    NUM_THO_SEP=${NUM_THO_SEP//1}
}
numericSeparators
declare -r NUM_THO_SEP NUM_DEC_SEP
...

But I think <(eval 'time true' 2>&1) heavy for the goal. I'm searching for a lighter and/or cleaner way for determine them (even both decimal and thousand separators).


Thanks to dan's answer, my functions would become simplier and quicker!

pi() {
    local _decsep pi=$(bc -l <<<'4*a(1)')
    printf -v _decsep %.1f 1
    printf '%.5f\n' ${pi/./${_decsep:1:1}}
}
pi
3.14159
LANG=de_DE.UTF-8 pi
3,14159
numericSeparators() {
    local numtest
    printf -v numtest "%'.1f" 1111
    NUM_THO_SEP=${numtest:1:1}
    NUM_THO_SEP=${NUM_THO_SEP/1}
    NUM_DEC_SEP=${numtest: -2:1}
}
numericSeparators
for loctest in   C   en_US.UTF-8   de_DE.UTF-8   ;do
    LANG=$loctest numericSeparators
    printf '  %-12s decimal: \47%s\47 thousand: \47%s\47\n' \
            "$loctest"  "$NUM_DEC_SEP"  "$NUM_THO_SEP"
done
  C            decimal: '.' thousand: ''
  en_US.UTF-8  decimal: '.' thousand: ','
  de_DE.UTF-8  decimal: ',' thousand: '.'
F. Hauri - Give Up GitHub
  • 64,122
  • 17
  • 116
  • 137

2 Answers2

3

You can get the locale's radix character (decimal separator) with:

printf -v ds '%#.1f' 1
ds=${ds//[0-9]}

And the thousands grouping separator, with:

printf -v ts "%'d" 1111
ts=${ts//1}

Some locales (eg. C) have no thousands separator, in which case $ts is empty. Conversely, if the radix character is not defined by the locale, POSIX (printf(3)) says it should default to .. The # flag guarantees that it will be printed.

dan
  • 4,846
  • 6
  • 15
0

In the vast majority of cases, you don't even have to know which locale setting you're in to properly decode any value, regardless of your own locale's settings.

Because you simply can't have 2 radix points (RP), of any base, one can just use gsub() or similar quick counting tools to figure out which one of , vs. . has multiple copies.

  • If both does, that's likely problematic input to begin with.

  • If one of each exists, right side one has to be the RP

  • When there's only one and ambiguous, then consider :

If there's 0 or more digits that's anything but 3 digits to right of that character, it couldn't possibly be the thousands sep

And more likely than not, thousands sep is bookended on both sides by digits, but having a leading edge or trailing edge radix point isn't all that uncommon

Only the 4 to 6 digit numbers (including below radix point, assuming its' still ambiguous) would require extra context to properly decode.

RARE Kpop Manifesto
  • 2,453
  • 3
  • 11