How to print non-ascii characters in SBCL Common Lisp

Question

Assuming I have such character stored in variable character, how do I print it? For example GREEK_SMALL_LETTER_XI with code 958.

(format t "~a" character) would just give ?

The character is printed correctly (ξ) both in emacs+slime with CCL as well as in terminal with SBCL, both on Linux and Mac OS X. Which is your environment (Operating system, Lisp implementation, terminal/editor)? — Renzo, Feb 03 '17 at 23:08
Windows 8.1, SBCL. After your comment I wrote `(let ((character (code-char 958)))(format t "~a" character))` in REPL (opened by sbcl.exe) and it returned `?` newline `NIL` — , Feb 03 '17 at 23:16
I think this is an issue with your terminal and not a common lisp issue. Can your terminal print the greek letters when you type the windows equivalent of `cat file-with-greek-letters.txt`? — anonymous, Feb 04 '17 at 09:44
You seem to be right. I tried `type PATH-TO-FILE` and output was string of some strange characters. Perhaps I'll simply install some linux. — , Feb 04 '17 at 10:50
@PrzemysławP Perhaps this thread can help you: http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how Good luck :) — anonymous, Feb 04 '17 at 22:43

freeB · Answer 1 · 2019-01-15T20:33:21.657

1

The OP mentioned in a comment that he was moving to Linux. In SBCL 1.4.15.Debian (and I presume on other Linuxes) Unicode characters are only printed as characters (as opposed to codes) by the (format) function, and not by (print).

Example:

(print (code-char 26159)) produces "#\U662F"

which is the unicode index of the character.

while

(format T "~a" (code-char 26159)) produces "是"

edited Jan 15 '19 at 20:33

answered Jan 13 '19 at 23:58

freeB

91
3

That's because `print` here prints a _character_, and does so readably (by the Lisp reader) by default. On the other hand, format produces output according to the given string template. Both examples are correct. – Svante Jan 14 '19 at 23:18
SO stopped me from editing my comment. I'll try again. Print does not print Chinese (and I assume other Unicode) characters correctly (it prints out a code, not the character). It does, however, print characters that fall within the Western European ASCII block correctly (it prints out a character for these). It's not a big deal, since lisp provides the (format) function. Nevertheless, I have amended the answer, as all that is important, is that readers know which function to use, and a definition of 'correct' might invite hair-splitting arguments concerning 'what is a character'. – freeB Jan 15 '19 at 20:57
1

No. The output of `print` for a character is a printed representation of _that character_, so that `read` will read it as _that same character_. If `print` were to output the string "是", then `read` would read a symbol named "是" from that. A character is not a string and is not a symbol. If anything, you might expect `print` to output "#\是", but that depends on direct unicode support on the reading side. See the CLHS for `print`, `*print-escape*`, `*print-readably*`, and section 22.1.3.2 (Printing Characters), and further info linked from there. – Svante Jan 16 '19 at 08:04
1

No, that's not the point. The point is that `print` produces output for `read`, while `format` produces output for humans (for whom the distinction between characters and strings is often not very relevant). `Read` doesn't care about aesthetics. The escaping just makes the output independent from encoding hassles, and `read` couldn't care less about appearances. – Svante Jan 16 '19 at 14:57
1

@Svante I think that point has already been made and taken. I think it is slightly silly to say “`write` is for machines to `read` and `format` is used for humans” because humans often do read the output of `write` on the repl, in the debugger, and so on, so it is probably bad for it to not be easily human readable. @freeB sbcl decides to print a character with either Unicode or name form if it is not a `standard-char-p` and a `graphic-char-p` which basically means printable ascii. I think it would be nice if there were a switch to allow printable Unicode too. – Dan Robertson Jan 20 '19 at 22:45
@DanRobertson: I wrote about `print`, not `write`. `Print` explicitly is set up to produce `read`able text. Now imagine that it outputs the character in UTF-8 encoding, which then gets `read` by an implementation running on Windows and expecting Cp-1272 encoding. Bummer. – Svante Jan 20 '19 at 23:28
@DanRobertson. A switch that directs the format of the print statement is a great idea, and would be very useful, at least for some users. Unicode is a moving target, and I would guess its implementation and processing involves difficult design decisions in a general purpose language, like Lisp. I appreciate the comments from yourself and Svante, which have both taught me something. – freeB Jan 21 '19 at 09:09
@Svante well sure but `print` is essentially the same as`write` but with `*print-readably*` set to `t`, a newline before, and a space afterwards. I appreciate that printing in ascii was chosen because some systems (eg doing fileio to the console on windows instead of consoleio) struggle with Unicode, and there are also encoding issues (eg utf-8 and friends), but there are already portability issues within ascii from the different line endings. I don’t think compatibility with broken systems is a good reason for the repl/debugger to completely obscure the characters it has. – Dan Robertson Jan 21 '19 at 11:02
@DanRobertson: Exactly, `print` is a shortcut for doing `write` `read`ably. I see a very explicit intent in that, and I believe that this intent should be followed as closely as possible. – Svante Jan 21 '19 at 12:06

How to print non-ascii characters in SBCL Common Lisp

1 Answers1