1

I was trying to print a Chinese String "哈哈" in clojure. The run-time environment is Windows 7, cmd.exe. The default code page is CP936(GBK). I can view GBK encoded source file under cmd.exe with "哈哈" displayed correctly, just by running type core.clj.

I know I can change cmd.exe's code page to 65001 to enable UTF-8, but I do want to know:

  1. Is it stupid to try printing GBK characters under Win7 cmd.exe using a Java program?
  2. Can I "generate" a string with GBK encoding in Clojure?

I used leiningen to setup project, and here is the project.clj file:

(defproject fibo "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "Eclipse Public License"
            :url "http://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.5.1"]]
  :jvm-opts ["-Dfile.encoding=utf-8"]
  :main fibo.core)

The source code is simply:

(ns fibo.core
  (:gen-class))

(defn -main
  [& args]
  ;; work around dangerous default behaviour in Clojure
  (alter-var-root #'*read-eval* (constantly false))
  (println "哈哈"))

The output looks like:

D:...\_dev\fibo> lein run
????

And I also tried to call lein run after setting JAVA_OPTION -Dfile.encoding=xxx. Unluckily, none of UTF-8 / GBK / GB18030 / ANSI / CP936 helps, I always got ????.

One thing to clarify: when I tried to use _JAVA_OPTION to change file.encoding, I didn't use :jvm-opts ["-Dfile.encoding=utf-8"] at the same time. After I tried all above encoding without luck, I added :jvm-opts in project.clj with UTF-8 as default encoding.

pimgeek
  • 259
  • 1
  • 6
  • 17
  • After a little more investigation within StackOverflow, I found this helpful [QA about cmd and Unicode](http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using/1259468#1259468), which I lack basic knowledge on Unicode/Console Fonts. The strange behaviors of Cmd.exe just added more complexity. In summary, I need to learn more knowledge about encoding itself. – pimgeek Jul 05 '13 at 06:56

1 Answers1

1

It depends on your source file encode I thought. Ye, I think use GBK for Java is somehow ... stupid. my files are UTF-8 encoded, I test them under win7 and Ubuntu both are displayed normal. as clojure is base on Java, and java always treate GBK poorly, I suggest you use UTF-8 always. If you have to use GBK, some java function can convert between GBK and UTF-8.

user2545464
  • 191
  • 1
  • 8