So, here's my problem:
If someone wants to output visually aligned strings using printf
, they'll obviously use %<n>s
(where <n>
is the minimum field width). And this works just fine, unless one of the strings contains unicode (UTF-8) characters.
Take this very basic example:
#include <stdio.h>
int main(void)
{
char* s1 = "\u03b1\u03b2\u03b3";
char* s2 = "abc";
printf("'%6s'\n", s1);
printf("'%6s'\n", s2);
return 0;
}
which will produce the following output:
'αβγ'
' abc'
This isn't all that surprising, because printf of course doesn't know that \u03b1
(which consists of two characters) only produces a single glyph on the output device (assuming UTF-8 is supported).
Now assume that i generate s1
and s2
, but have no control over the format string used to output those variables. My current understanding is that nothing i could possibly do to s1
would fix this, because i'd have to somehow fool printf
into thinking that s1
is shorter than it actually is. However, since i also control s2
, my current solution is to add a non-printing character to s2
for each unicode character in s1
, which would look something like this:
#include <stdio.h>
int main(void)
{
char* s1 = "\u03b1\u03b2\u03b3";
char* s2 = "abc\x06\x06\x06";
printf("'%6s'\n", s1);
printf("'%6s'\n", s2);
return 0;
}
This will produce the desired output (even though the actual width no longer corresponds to the specified field width, but i'm willing to accept that):
'αβγ'
'abc'
For context:
The example above is only to illustrate the unicode-problem, my actual code involves printing numbers with SI-prefixes, only one of which (µ
) is a unicode character. Therefore i would generate strings containing only up to one normal or unicode character (which is why i can accept the resulting offset in the field-width).
So, my questions are:
- Is there a better solution for this?
- Is
\x06
(ACK) a sensible choice (i.e. a character without undesired side-effects)? - Can you think of any problems with this approach?