With the base64-encoded string JVBERi0xLjENCiXi48/TDQoxIDAgb2JqDQo8PCAN
I am getting difference results from emacs than from the clojure code below.
Can anyone explain to me why?
The elisp
below gives the correct output, giving me ultimately a valid pdf document (when i past the entire string). I am sure my emacs buffer is set to utf-8
:
(base64-decode-string "JVBERi0xLjENCiXi48/TDQoxIDAgb2JqDQo8PCAN")
"%PDF-1.1
%âãÏÓ
1 0 obj
<<
Here is the same output with the chars in decimal (i think):
"%PDF-1.1
%\342\343\317\323
1
The clojure
below gives incorrect output, rendering the pdf document invalid when i give the entire string:
(import 'java.util.Base64 )
(defn decode [to-decode]
(let [
byts (.getBytes to-decode "UTF-8")
decoded (.decode (java.util.Base64/getDecoder) byts)
]
(String. decoded "UTF-8")))
(decode "JVBERi0xLjENCiXi48/TDQoxIDAgb2JqDQo8PCAN")
"%PDF-1.1
%����
1 0 obj
<<
Same output, chars in decimal (i think). I couldn't even copy/paste this, i had to type it in. This is what it looks like when i opened the PDF in text-mode
for the first three columns:
"%PDF-1.1
%\357\277\275\357\277\275\357\277\275\357\277\275
1"
Edit Taking emacs out of the equation:
If i write the encoded string to a file called encoded.txt
and pipe it through the linux program base64 --decode
i get valid output and a good pdf also:
This is clojure:
(defn decode [to-decode]
(let [byts (.getBytes to-decode "ASCII")
decoded (.decode (java.util.Base64/getDecoder) byts)
flip-negatives #(if (neg? %) (char (+ 255 %)) (char %))
]
(String. (char-array (map flip-negatives decoded)) )))
(spit "./output/decoded.pdf" (decode "JVBERi0xLjENCiXi48/TDQoxIDAgb2JqDQo8PCAN"))
(spit "./output/encoded.txt" "JVBERi0xLjENCiXi48/TDQoxIDAgb2JqDQo8PCAN")
Then this at the shell:
➜ output git:(master) ✗ cat encoded.txt| base64 --decode > decoded2.pdf
➜ output git:(master) ✗ diff decoded.pdf decoded2.pdf
2c2
< %áâÎÒ
---
> %����
➜ output git:(master) ✗
update - this seems to work
Alan Thompson's answer below put me on the correct track, but geez what a pain to get there. Here's the idea of what works:
(def iso-latin-1-charset (java.nio.charset.Charset/forName "ISO-8859-1" ))
(as-> some-giant-string-i-hate-at-this-point $
(.getBytes $)
(String. $ iso-latin-1-charset)
(base64/decode $ "ISO-8859-1")
(spit "./output/a-pdf-that-actually-works.pdf" $ :encoding "ISO-8859-1" ))