Short answer:
You do not have to take any additional steps when using Unicode characters instead of plain ASCII. Current versions of zsh
fully support Unicode characters and can handle them correctly. So even if a character is encoded by multiple bytes, zsh
will still know that it is only a single character.
When to use %{...%}
and %G
%{...%}
is used to indicate to zsh
that the string inside does not change the cursor position. This is for example useful, if you want to add escape sequences as used for setting colors:
print -P '%{\e[31m%}terminal red%{\e[0m%}'
print -P '%{\e[38;2;0;127;255m%}#007FFF%{\e[0m%}'
Without %{...%}
zsh
would have to assume that each character of the escape sequence moves the cursor one position to the right.
Using %G
inside %{...%}
(or %1{...%}
) tells zsh
to assume that a single character will be output. This is for counting purposes only, it will not move the cursor on its own.
According to the ZSH Manual:
This is useful when outputting characters that otherwise cannot be correctly handled by the shell, such as the alternate character set on some terminals.
As zsh
is able to handle Unicode characters, it is unnecessary there (although not necessarily wrong).
Reason for unexpected results of strlen "%{↓%G%}"
:
This is due to the fact that strlen
really only tries to remove any null-length prompt sequences (like %B
or %F{red}
) instead of actually measuring the printed length of the resulting string (which is probably impossible anyway). In many cases this works well enough, but it fails spectacularly in the case of "%{↓%G%}"
, which is actually equivalent to "↓"
in the context of zsh
prompts.
Explanation:
In order to find these null-length prompt sequences, strlen
matches its input to this pattern
invisible=%([BSUbfksu]|([FB]|){*})'
This also contains the the sub-pattern %{*}
, which will match on %{…%}
. Then
LEN=${#${(S%%)FOO//$~invisible/}}
just removes any matching substring from FOO
before counting the characters.
On top of that, it does not actually handle %G
in any way and just removes it together with the surrounding %{...%}
.
As the whole string "%{↓%G%}"
matches the pattern, it will be completely removed, resulting in the unexpected character count of 0
.
BTW: This does not mean, that you should not use strlen
(I have been using something derived from it for quite some time in my prompt). But you should be aware of some limitations:
- It does not work with
%G
(obviously).
- It cannot handle numeric arguments for
%{...%}
like %3{...%}
.
- It does also not recognize numeric arguments after
%
for foreground and background colors like %1F
(instead of %F{1}
or %F{red}
)
- It cannot handle nested
%{...%}
, or really any }
inside %{...%}
. (This is for example important when intending to use %D{string}
for date formatting, as the length of the format string string
would have to match the length of the resulting date without using `%{...%} around it.)
Lastly, there was a bug in the original definition and it should be:
local invisible='%([BSUbfksu]|([FK]|){*})'
The second B
should be a K
as it is intended to match the prompt escape for background colors. (%B
starts boldface mode)