26

Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary?

I want the following lines to be aligned correctly for a report:

2011-11-27   A1823    -Ref. Leif  -           12 873,00    18.98
2011-11-30   A1856    -Rättat xx -            6 594,00    19.18

I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f

Using: php-5.3.23-nts-Win32-VC9-x86

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
Mille
  • 713
  • 2
  • 8
  • 10
  • This problem (that different characters consist of different numbers of bytes and different grapheme clusters consist of different numbers of characters) is *somewhat* similar to (but not the same as) http://stackoverflow.com/questions/9166698/aligning-based-on-the-width-of-letters-with-sprintf. The bottom line is that it might be easiest to put the data in an HTML table instead. – PleaseStand Apr 14 '13 at 20:24
  • 2
    Yeah this is definitely not a duplicate, this question is about multibyte characters is sprintf(), the other one is about font display widths. – xyphoid Aug 29 '13 at 23:26
  • 3
    This was not a duplicate question at all... You can do the trick by doing : utf8_encode(sprintf('format', utf8_decode($yourstring));... Of course you'll have to check every arguments if many are given. – Gérald Croës Sep 25 '13 at 13:57
  • 2
    This question is about characters with a unicode code point above 127, that when encoded with UTF-8 uses more than one byte. Unfortunately `sprintf` and `printf` don't handle that. When printing a 2 character string that uses 6 bytes when encoded with UTF-8, `%8s` prints the wrong number of spaces (8-6=2) instead of (8-2=6). This has _**NOTHING**_ to do with the font used, like the question that this question is supposed to be duplicate of. This question is about phps' lack of support for multibyte characters. – some Jan 17 '14 at 23:28

4 Answers4

13

Strings in PHP are basically arrays of bytes (not characters). They cannot work natively with multibyte encodings (such as UTF-8).

For details see:
https://www.php.net/manual/en/language.types.string.php#language.types.string.details

Most string functions in PHP have multibyte equivalent though (with the mb_ prefix). But the sprintf does not.

There's a user comment (by "viktor at textalk dot com") with multibyte implementation of the sprintf on the function's documentation page at php.net. It may work for you:
https://www.php.net/manual/en/function.sprintf.php#89020

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
  • 1
    correct explanation, but the linked function does not work for me – even after doing the mb_* function name replacements mentioned in the remarks. I'd hoped for a better solution than @nimmneun has provided, it's my current hacky solution too. – flowtron Jun 12 '19 at 14:37
12

I was actually trying to find out if PHP ^7 finally has a native mb_sprintf() but apparently no xD.

For the sake of completeness, here is a simple solution I've been using in some old projects. It just adds the diff between strlen & mb_strlen to the desired $targetLengh. The non-multibyte example is just added for the sake of easy comparison =).

$text = "Gultigkeitsprufung ist fehlgeschlagen: %{errors}";
$mbText = "Gültigkeitsprüfung ist fehlgeschlagen: %{errors}";
$mbTextRussian = "Проверка не удалась: %{errors}";

$targetLength = 60;
$mbTargetLength = strlen($mbText) - mb_strlen($mbText) + $targetLength;
$mbRussianTargetLength = strlen($mbTextRussian) - mb_strlen($mbTextRussian) + $targetLength;

printf("%{$targetLength}s\n", $text);
printf("%{$mbTargetLength}s\n", $mbText);
printf("%{$mbRussianTargetLength}s\n", $mbTextRussian);

result

            Gultigkeitsprufung ist fehlgeschlagen: %{errors}
            Gültigkeitsprüfung ist fehlgeschlagen: %{errors}
                              Проверка не удалась: %{errors}

update 2019-06-12


@flowtron made me give it another thought. A simple mb_sprintf() could look like this.

function mb_sprintf($format, ...$args) {
    $params = $args;

    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return strlen($value) - mb_strlen($value) + $length[0];
    };

    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);

    return sprintf($format, ...$args);
}

echo mb_sprintf("%-10s %-10s %10s\n", 'thüs', 'wörks', 'ök');
echo mb_sprintf("%-10s %-10s %10s\n", 'this', 'works', 'ok');

result

thüs       wörks              ök
this       works              ok

I only did some happy path testing here, but it works for PHP >=5.6 and should be good enough to give ppl an idea on how to encapsulate the behavior. It does not work with the repetition/order modifiers though - e.g. %1$20s will be ignored/remain unchanged.

nimmneun
  • 1,129
  • 13
  • 16
  • 2
    I had hoped to find something less hacky, because this is the way I've been doing it too - upvoted since the linked routine in @Martin Prikryl doesn't work (for me). – flowtron Jun 12 '19 at 14:38
  • you made me give it another though =) – nimmneun Jun 12 '19 at 20:54
4

If you're using characters that fit in the ISO-8859-1 character set, you can convert the strings before formatting, and convert the result back to UTF8 when you are done

utf8_encode(sprintf("%-12s %-8s", utf8_decode($paramOne), utf8_decode($paramTwo))
Vestman
  • 386
  • 4
  • 7
0

Problem

There is no multibyte format functions.

Idea

You can't convert input strings. You should change format lengths. A format %4s means 4 widths (not characters - see footnote). But PHP format functions count bytes. So you should add format lengths to bytes - widths.

Implementations

from @nimmneun

function mb_sprintf($format, ...$args) {
    $params = $args;
    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return $length[0] + strlen($value) - mb_strwidth($value);
    };
    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);
    return sprintf($format, ...$args);
}

And don't forget another option str_pad($input, $length, $pad_char=' ', STR_PAD_RIGHT)

function mb_str_pad(...$args) {
    $args[1] += strlen($args[0]) - mb_strwidth($args[0]);
    return str_pad(...$args);
}

Footnote

Asian characters have 3 bytes and 2 width and 1 character length. If your format is %4s and the input is one asian character, you should need two spaces (padding) not three.

Jehong Ahn
  • 1,872
  • 1
  • 19
  • 25