5

I'm a beginning user of both Emacs and Clojure, testing my working environment with some simple text processing. I'm having problems getting the Slime REPL to properly print UTF-8 text stored in a vector.

I start by reading the contents of a file (a dictionary of Tocharian B) into a vector:

user> (def toch
        (with-open [rdr (java.io.BufferedReader.
                         (java.io.FileReader. "/directory/toch.txt"))]
          (vec (line-seq rdr))))
=> #'user/toch

I then try to get a line from the vector, and I get garbage:

user> (toch 44)
=> " Examples :   /// kektseñe akappi ste ‘the body is an impurity’ (121b5), akappī = BHS aśuciṃ (529a3). "

I can enter the string into the Slime REPL and get it back as it should be:

user> " Examples :   /// kektseñe akappi ste ‘the body is an impurity’ (121b5), akappī = BHS aśuciṃ (529a3). "
=> " Examples :   /// kektseñe akappi ste ‘the body is an impurity’ (121b5), akappī = BHS aśuciṃ (529a3). "

And I can print to disk without any problem:

user> (binding [*out* (java.io.FileWriter. "test.txt")]
        (prn (toch 44)))
=> nil
[Contents of test.txt: " Examples :   /// kektseñe akappi ste ‘the body is an impurity’ (121b5), akappī = BHS aśuciṃ (529a3). "]

And getting lines from the vector from other REPLs (e.g. clj, lein repl) also works fine. It's only when I try to look at the contents of the vector within the Slime REPL that there's any problem.

What's going on here? Is there some miscommunication between Emacs and Swank? How can I fix this?

liwp
  • 6,746
  • 1
  • 27
  • 39
nmashton
  • 51
  • 2
  • Weird. What is the result of `(int \ṃ)` when entered into the SLIME REPL? – Matthias Benkard Feb 17 '12 at 08:45
  • Since that is the correct answer, there must be something wrong with the way the file is read. `((toch 44) 91)` should yield `7747` as well. If it does, then this issue is a complete mystery to me. If, on the other hand, it doesn't, then you need to check what encoding Java assumes when reading the file. – Matthias Benkard Feb 17 '12 at 20:04
  • 1
    Sorry, `(int (.charAt (toch 44) 91))` is what I meant. – Matthias Benkard Feb 17 '12 at 20:14
  • 1
    It gives the wrong answer. So I replaced FileReader with InputStreamReader and FileInputStream, and I supplied InputStreamReader with the correct encoding, and now it works. So the problem must be that FileReader assumes the wrong encoding. Thank you! – nmashton Feb 18 '12 at 02:12

1 Answers1

3

Try putting

(setq slime-net-coding-system 'utf-8-unix)

into your .emacs file (or setting and saving the variable via M-x customize-variable).

In addition, make sure that you are running Clojure from within a UTF-8-enabled locale (if you're on Un*x and using Leiningen, try something like env LC_ALL=en_US.UTF-8 lein swank).

Matthias Benkard
  • 15,497
  • 4
  • 39
  • 47
  • Unfortunately, neither of those fixes it. I already had slime-net-coding-system set to utf-8-unix. Changing the swank server's locale didn't change anything. – nmashton Feb 16 '12 at 18:47