15

I have this C code:

locale_t myLocale = newlocale(LC_NUMERIC_MASK, "en_US", (locale_t) 0);
uselocale(myLocale);
ptrLocale = localeconv();
ptrLocale->thousands_sep = (char *) "'";

int i1 = snprintf( s1, sizeof(s1), "%'d", 123456789);

The output in s1 is 123,456,789.

Even I set ->thousands_sep to ' it is ignored. Is there a way to set any character as the thousands separator?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Peter VARGA
  • 4,780
  • 3
  • 39
  • 75

6 Answers6

3

Here is a very simple solution which works on each linux distribution and does not need - as my 1st answer - a glibc hack:


All these steps must be performed in the origin glibc directory - NOT in the build directory - after you built the glibc version using a separate build directory as suggested by this instructions.

My new locale file is called en_AT.

  1. Create in the localedata/locales/ directory from an existing file en_US a new file en_AT .
  2. Change all entries for thousands_sep to thousands_sep "<U0027>" or whatever character you want to have as the thousands separator.
  3. Change inside of the new file all occurrences of en_US to en_AT.
  4. Add to the file localedata/SUPPORTED the line: en_AT.UTF-8/UTF-8 \.
  5. Run in the build directory make localedata/install-locales.
  6. The new locale will be then automatically added to the system and is instantly accessible for the program.

In the C/C++ program you switch to the new thousands separator character with:

setlocale( LC_ALL, "en_AT.UTF-8" );

using it with printf( "%'d", 1000000 ); which produces this output

1'000'000


Remark: When you need in the program different localizations which are determinated while the runtime you can use this example from the man pages where you load the requested locale and just replace the LC_NUMERIC settings from en_AT.

Peter VARGA
  • 4,780
  • 3
  • 39
  • 75
2

Function localeconv() just read locate settings, and ptrLocale->thousands_sep itself not changes that settings for current locale.

EDIT:

I do not know how to do this in C, but lots of examples with C++ output can be found. See the following example in C++:

#include <iostream>
#include <locale>
using namespace std;

struct myseps : numpunct<char> { 
   // use ' as separator
   char do_thousands_sep() const { return '\''; } 

   // digits are grouped by 3
   string do_grouping() const { return "\3"; }
};

int main() {
  cout.imbue(locale(locale(), new myseps));
  cout << 1234567; // the result will be 1'234'567
}

EDIT 2:

The C++ reference said:

localeconv() returns a pointer to a filled-in object of type struct lconv. The values contained in the object can be overwritten by subsequent calls to localeconv and do not directly modify the object. Calls to setlocale with category values of LC_ALL, LC_MONETARY, or LC_NUMERIC overwrite the contents of the structure.

I tried the following example in MS Visual Studio 2012 (I understand that it is bad and unsafe style):

#include <stdio.h>
#include <locale.h>
#include <string.h>

int main() {
    setlocale(LC_NUMERIC, "");
    struct lconv *ptrLocale = localeconv();
    strcpy(ptrLocale->decimal_point, ":");
    strcpy(ptrLocale->thousands_sep, "'");
    char str[20];
    printf("%10.3lf \n", 13000.26);
    return 0;
}

and I saw the result:

  13000:260

therefore, it can be assumed that the changes of decimal_point and thousands_sep are possible through pointer received with localeconv(), but printf ignores thousands_sep.

EDIT 3:

Updated C++ example:

#include <iostream>
#include <locale>
#include <sstream>
using namespace std;

struct myseps : numpunct<char> { 
   // use ' as separator
   char do_thousands_sep() const { return '\''; } 

   // digits are grouped by 3
   string do_grouping() const { return "\3"; }
};

int main() {
  stringstream ss;
  ss.imbue(locale(locale(), new myseps));
  ss << 1234567;  // printing to string stream with formating
  printf("%s\n", ss.str().c_str()); // just output when ss.str() provide string, and c_str() converts it to char*
}
VolAnd
  • 6,367
  • 3
  • 25
  • 43
  • But which structure is `printf()` accessing? There must be a way how to override the thousand character. I diggered through the `printf()` from GNU glib library and it is not hardcoded there! – Peter VARGA Feb 26 '15 at 14:30
  • I suppose you need `setlocale()` function to change current locale – VolAnd Feb 26 '15 at 14:32
  • Also, check whether `snprintf` is locale-dependent function – VolAnd Feb 26 '15 at 14:35
  • @VoIAnd: Yes, but how do I set explicitly another thousand separator character? Calling `setlocale()` with a predefined locale "en_US", "de_DE", ... uses the separator as defined for the locale. – Peter VARGA Feb 26 '15 at 14:35
  • @AlBundy : I suppose, `printf`-family functions just ignore `ptrLocale->thousands_sep` settings. See **EDIT 2** – VolAnd Feb 26 '15 at 15:31
  • @VoIAnd: No, it does not. You get it with the ' flag and changing the locale from en_US to de_DE it also changes the character. I read the source code from GNU glib `printf()` and they respect it. My question is how do I get to the buffer for the thousand separator character so I can change it. – Peter VARGA Feb 26 '15 at 15:33
  • @Edit 2: This is funny. 1) In SLES I get Segmentation fault when I use `strcpy()` to overwrite it. 2) Try this format specifier: `printf("%'10.3lf\n", 13000.26);` - use the `'` character – Peter VARGA Feb 26 '15 at 15:35
  • That is really funny - `printf("%'10.3lf \n", 13000.26);` compiled with MSVS print out just `'10.3lf` . – VolAnd Feb 26 '15 at 15:38
  • Lets try another way: How can I in C++ use the `.imbue` functions but the result is returned as `std::string` or `char *` and I can provide more format specifier like this: `"%llu bytes loaded within %.4fms in thread #%03d"` – Peter VARGA Feb 26 '15 at 15:39
  • I can confirm the possibility to change the decimal separator, but could not find either how to use a thousand separator with `printf`. – Serge Ballesta Feb 26 '15 at 15:40
  • BTW in my own tests, I did not use strcpy to set the decimal or thousand separator but simply `ptrLocale->decimal_point = ":"`. If thousand separator was empty string, using `strcpy` with a non empty source could lead to buffer overrun. – Serge Ballesta Feb 26 '15 at 15:43
  • @VoIAnd: This is not exactly what I want. Using `imbue` is just another way as I had it till now with my self written functions. I have to think about it! – Peter VARGA Feb 26 '15 at 16:06
  • @those who are wondering how to use printf with thousand separators: you must `setlocale(LC_NUMERIC, "");` see answer to: "How can I format currency with commas in C?" https://stackoverflow.com/a/11695126/1973022 . – u_Ltd. Apr 12 '18 at 08:07
1

This answer is derived from VolAnd's one.

According to this source, the thousand separator in only used with the non standard ' flag.

So if your printf is POSIX.1-2008 compatible, you could use :

setlocale(LC_NUMERIC, "");
struct lconv *ptrLocale = localeconv();
ptrLocale->decimal_point = ":";
ptrLocale->thousands_sep = "'";
char str[20];
printf("%'10.3lf \n", 13000.26);
return 0;
Community
  • 1
  • 1
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • This is the code from my question where I am asking how to do it. Your code does not work for me. I have to use `setlocale(LC_NUMERIC, "en_US");` in order to see at least the American thousand separator character. – Peter VARGA Feb 26 '15 at 16:05
  • @AlBundy : with this code, I could successfully change the decimal separator. Unfortunately, the two systems on which I tried it do not support the `'` non standard flag (whatever locale I use). – Serge Ballesta Feb 26 '15 at 16:14
  • Note that it depends on which standard you choose as to whether it is standard or not. POSIX specifies that [`printf()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html) supports the `'` as indicating that the thousands separator should be printed appropriately. Also, Mac OS X (10.10.5) and by inference BSD has a set of `_l` printing functions: `int printf_l(locale_t loc, const char * restrict format, ...);` and `int fprintf_l(FILE * restrict stream, locale_t loc, const char * restrict format, ...);` for example. These are the best way to go if they're available. – Jonathan Leffler Sep 05 '15 at 17:04
  • More seriously, note that the POSIX specification for [`localeconv()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/localeconv.html) explicitly says: _The `localeconv()` function need not be thread-safe. … The `localeconv()` function shall return a pointer to the filled-in object. The application shall not modify the structure to which the return value points, nor any storage areas pointed to by pointers within the structure._ A portable application may not do as this answer suggests. – Jonathan Leffler Sep 05 '15 at 17:14
0

There is a really very dirty hack how to change the thousand separator character for printf():

  1. Download the GNU libc.
  2. run the configure --prefix=/usr/glibc-version command
  3. run make -j 8
  4. get the very long compiler command with all switches from the make output
  5. write the C source file setMyThousandSeparator.c - content see below
  6. compile this source file with the gcc switches from point 3.
  7. in your normal C source code call setMyThousandSeparator("'") function before the printf() call.
  8. link setMyThousandSeparator.o with your project.

For the moment I tried it when linking libc static but it works.

Content of setMyThousandSeparator.c:

#include <locale/localeinfo.h>

void setMyThousandSeparator(char * sMySeparator)
{
    _NL_CURRENT (LC_NUMERIC, THOUSANDS_SEP) = sMySeparator;
}

Info: This solution is thread safe because it is accessing the same data as printf() does!

Peter VARGA
  • 4,780
  • 3
  • 39
  • 75
0

Here is a specialized C function which I'm using for uint64_t type, but it can be easily generalized. Basically, it injects the thousand separators into the string produced by snprintf().

This method is independent on LOCALE, C-standard used, etc - and of course, You don't have to recompile the GNU libc ;)

#if __WORDSIZE == 64
   #define PRT_U64 "lu"
#else
   #define PRT_U64 "llu"
#endif

char* th_sep_u64(uint64_t val, char* buf) {
   char tmpbuf[32]; //18'446'744'073'709'551'615 -> 26 chars
   int  nch, toffs, pos;
   pos   = 1;
   toffs = 31;
   nch   = snprintf(tmpbuf, 32, "%"PRT_U64, val);
   nch  -- ;
   buf[toffs] = 0;

   for (; nch>=0; --nch) {
      toffs -- ;
      buf[toffs] = tmpbuf[nch];
      if ((0 == (pos % 3)) && (nch > 0)) {
         toffs -- ;
         buf[toffs] = '\''; //inject the separator
      }
      pos ++ ;
   }
   buf += toffs;
   return buf;
}

Usage:

{
   char     cbuf[32]; 
   uint64_t val = 0xFFFFFFFFFFFFFFFFll;

   printf("%s", th_sep_u64(val, cbuf));

   //result: 18'446'744'073'709'551'615
}

Regards

vtomazzi
  • 1
  • 1
  • This is a nice function but it becomes _tiresome_ when your format string contains lot of different formats. Recompiling GNU glibc can indeed become very tricky and I don't consider it as a good solution - 5 years later. Now, I solve it with building a new LOCALE which is really a very simple task and the huge advantage is, that the thousand separator works even in the Linux Bash command line so it becomes global. – Peter VARGA May 01 '20 at 16:25
  • Yeah, it all depends on what You need to achieve. I need to run my code on many machines, different OSes, so for me it would be _tiresome_ to build / install / change LOCALE for each case. – vtomazzi May 01 '20 at 17:45
0

Maybe "just" add a new printf specifier:

static int printf_arginfo_M(const struct printf_info *info, size_t n, int *argtypes, int *size) {

    if ( info->is_long_double ) {               // %llM
        size[0] = sizeof(long long);
        if ( n > 0 ) argtypes[0] = PA_INT | PA_FLAG_LONG_LONG;
    }
    else if ( info->is_long ) {                 // %lM
        size[0] = sizeof(long);
        if ( n > 0 ) argtypes[0] = PA_INT | PA_FLAG_LONG;
    }
    else {
        size[0] = sizeof(int);                  // %M
        if ( n > 0 ) argtypes[0] = PA_INT;
    }

    return 1;
}

static int printf_output_M(FILE *stream, const struct printf_info *info, const void *const args[])
{
    long long number;

    if ( info->is_long_double ) {               // %llM
        number = *(const long long*)(args[0]);
    }
    else if ( info->is_long ) {                 // %lM
        number = *(const long*)(args[0]);
    }
    else {                                      // %M
        number = *(const int*)(args[0]);
    }

    long long value = (number < 0) ? -number : number;
    int len;
    char buf[32];
    char *pos = &buf[31];
    int i = 0;

    *pos = '\0';

    do {
        if ( (i % 3 == 0) && (i > 0) ) *--pos = '.';
        *--pos = '0' + value % 10;
        value /= 10;
        i++;
    } while (value > 0);

    if (number < 0) *--pos = '-';

    len = fprintf(stream, "%s", pos);

    return len;
}

Usage:

register_printf_specifier('M', printf_output_M, printf_arginfo_M);

printf("%M\n", -1234567890);
printf("%lM\n", -1234567890123456789l);
printf("%llM\n", -1234567890123456789ll);

The downside is, gcc complains about the new specifier so that you might want to disable these warnings:

#pragma GCC diagnostic ignored "-Wformat"
#pragma GCC diagnostic ignored "-Wformat-extra-args"