15

Is there an idiomatic way of encoding and decoding a string in Clojure as hexadecimal? Example from Python:

'Clojure'.encode('hex')
# ⇒ '436c6f6a757265'
'436c6f6a757265'.decode('hex')
# ⇒ 'Clojure'

To show some effort on my part:

(defn hexify [s]
  (apply str
    (map #(format "%02x" (int %)) s)))

(defn unhexify [hex]
  (apply str
    (map 
      (fn [[x y]] (char (Integer/parseInt (str x y) 16))) 
      (partition 2 hex))))

(hexify "Clojure")
;; ⇒ "436c6f6a757265"

(unhexify "436c6f6a757265")
;; ⇒ "Clojure"
Zaz
  • 46,476
  • 14
  • 84
  • 101
Iceland_jack
  • 6,848
  • 7
  • 37
  • 46

4 Answers4

18

Since all posted solutions have some flaws, I'm sharing my own:

(defn hexify "Convert byte sequence to hex string" [coll]
  (let [hex [\0 \1 \2 \3 \4 \5 \6 \7 \8 \9 \a \b \c \d \e \f]]
      (letfn [(hexify-byte [b]
        (let [v (bit-and b 0xFF)]
          [(hex (bit-shift-right v 4)) (hex (bit-and v 0x0F))]))]
        (apply str (mapcat hexify-byte coll)))))

(defn hexify-str [s]
  (hexify (.getBytes s)))

and

(defn unhexify "Convert hex string to byte sequence" [s] 
      (letfn [(unhexify-2 [c1 c2] 
                 (unchecked-byte 
                   (+ (bit-shift-left (Character/digit c1 16) 4)
                      (Character/digit c2 16))))]
     (map #(apply unhexify-2 %) (partition 2 s))))

(defn unhexify-str [s]
  (apply str (map char (unhexify s)))) 

Pros:

  • High performance
  • Generic byte stream <--> string conversions with specialized wrappers
  • Handling leading zero in hex result
Brad Koch
  • 19,267
  • 19
  • 110
  • 137
Grzegorz Luczywo
  • 9,962
  • 1
  • 33
  • 22
17

Your implementation(s) don't work for non-ascii characters,

(defn hexify [s]
  (apply str
    (map #(format "%02x" (int %)) s)))

(defn unhexify [hex]
  (apply str
    (map 
      (fn [[x y]] (char (Integer/parseInt (str x y) 16))) 
        (partition 2 hex))))

(= "\u2195" (unhexify(hexify "\u2195")))
false ; should be true 

To overcome this you need to serialize the bytes of the string using the required character encoding, which can be multi-byte per character.

There are a few 'issues' with this.

  • Remember that all numeric types are signed in the JVM.
  • There is no unsigned-byte.

In idiomatic java you would use the low byte of an integer and mask it like this wherever you used it.

    int intValue = 0x80;
    byte byteValue = (byte)(intValue & 0xff); -- use only low byte

    System.out.println("int:\t" + intValue);
    System.out.println("byte:\t" + byteValue);

    -- output:
    -- int:   128
    -- byte:  -128

clojure has (unchecked-byte) to effectively do the same.

For example, using UTF-8 you can do this:

(defn hexify [s]
  (apply str (map #(format "%02x" %) (.getBytes s "UTF-8"))))

(defn unhexify [s]
  (let [bytes (into-array Byte/TYPE
                 (map (fn [[x y]]
                    (unchecked-byte (Integer/parseInt (str x y) 16)))
                       (partition 2 s)))]
    (String. bytes "UTF-8")))

; with the above implementation:

;=> (hexify "\u2195")
"e28695"
;=> (unhexify "e28695")
"↕"
;=> (= "\u2195" (unhexify (hexify "\u2195")))
true
Brad Koch
  • 19,267
  • 19
  • 110
  • 137
sw1nn
  • 7,278
  • 1
  • 26
  • 36
  • All this is fine as long as performance is no concern -- I bet the Python example will outperform these solutions on any longer string. If you need performance, there's a lot more work to do. – Marko Topolnik Apr 10 '12 at 13:35
5

Sadly the "idiom" appears to be using the Apache Commons Codec, e.g. as done in buddy:

(ns name-of-ns
  (:import org.apache.commons.codec.binary.Hex))

(defn str->bytes
  "Convert string to byte array."
  ([^String s]
   (str->bytes s "UTF-8"))
  ([^String s, ^String encoding]
   (.getBytes s encoding)))

(defn bytes->str
  "Convert byte array to String."
  ([^bytes data]
   (bytes->str data "UTF-8"))
  ([^bytes data, ^String encoding]
   (String. data encoding)))

(defn bytes->hex
  "Convert a byte array to hex encoded string."
  [^bytes data]
  (Hex/encodeHexString data))

(defn hex->bytes
  "Convert hexadecimal encoded string to bytes array."
  [^String data]
  (Hex/decodeHex (.toCharArray data)))
Jeremy Field
  • 652
  • 7
  • 12
  • Nothing sad about using a tried-and-trusted library instead of reinventing the wheel – Andy Jun 14 '22 at 06:53
  • JDK17 seems to have [HexFormat](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) – Jeremy Field Jul 24 '22 at 23:58
4

I believe your unhexify function is as idiomatic as it can be. However, hexify can be written in a simpler way:

(defn hexify [s]
  (format "%x" (new java.math.BigInteger (.getBytes s))))
Óscar López
  • 232,561
  • 37
  • 312
  • 386