2

How does one convert a file that has been base64 encoded back to its original format and write it to disk? For instance I have a pdf file which has been mime64 encoded. The file starts with:

data:application/pdf;base64,JVBER

I would like to write this out to disk in the proper format. I have tried several libraries (e.g. ring.util.codec) that decode the string into a byte-array, but if I write the resulting byte-array out to a file (using spit) the file appears corrupted.

UPDATE:

The PHP function base64_decode appears to be doing what I am looking for, as it returns a string. What is the equivalent in Java?

mac
  • 9,885
  • 4
  • 36
  • 51
  • Have you looked on the internet for base64 tools? Or in Linux have you searched your package repository? – Kerrek SB Aug 21 '11 at 22:37
  • 2
    http://stackoverflow.com/questions/469695/decode-base64-data-in-java – Nicolas Buduroi Aug 21 '11 at 22:41
  • 1
    I have and I also read the question referenced above. I can decode the string into a byte.array, but how do I write this to a file in a way that turns the contents into the original file format? – mac Aug 21 '11 at 22:47

2 Answers2

3

In Clojure, there is data.codec (formerly in clojure-contrib).

Using Java interoperability :

So those are the helper functions I used for images when using data.codec :

(require '[clojure.data.codec.base64 :as b64-codec])

(defn write-img! [id b64]
  (clojure.java.io/copy
   (decode-str (chop-header b64))
   (java.io.File. (str "/Users/nha/tmp/" id "." (b64-ext b64)))))

(defn decode-str [s]
  (b64-codec/decode (.getBytes s)))

(defn in?
  "true if the seq coll contains the element el"
  [coll el]
  (some #(= el %) coll))

(defn b64-ext [s]
  (if-let [ext (second (first (re-seq #"data:image/(.*);base64.*" s)))]
    (if (in? ["png" "jpeg"] ext)
      ext
      (throw (Exception. (str "Unsupported extension found for image " ext))))
    (throw (Exception. (str "No extension found for image " s)))))

(defn chop-header [s]
  (nth (first (re-seq #"(data:image/.*;base64,)(.*)" s)) 2))
nha
  • 17,623
  • 13
  • 87
  • 133
3

Any java library should work (here's one, from Apache Commons, here's one totally in Clojure from Clojure-contrib

I suspect the content is modified somehow, meaning bytes may be converted to string using some encoding, and then trying to read this string back to bytes using a different encoding.

The first step may be to check you have the exact same number of bytes in the file on the server side, and the file you are trying to read. Also, try to confirm the checksum (MD5) is the same.

In any case, a PDF file is a binary file, so you should NOT convert it to string anywhere, but straight bytes.

Nicolas Modrzyk
  • 13,961
  • 2
  • 36
  • 40
  • 1
    I did check the integrity of the file and it has not been corrupted. It also can be converted using the PHP base64_decode function without any issues. – mac Aug 22 '11 at 08:24
  • can you make the raw bytes available somewhere ? – Nicolas Modrzyk Aug 23 '11 at 01:20
  • 1
    I solved it. Apparently the datauri header was messing up the decoding. If I chop off the header "data:application/pdf;base64," it works. – mac Aug 23 '11 at 09:18