8

I'm looking for a function that reverses clojure hiccup

so

   <html></html>

turns into

[:html]

etc.


Following up from the answer by @kotarak, This now works for me:

(use 'net.cgrand.enlive-html)
(import 'java.io.StringReader)

(defn enlive->hiccup
   [el]
   (if-not (string? el)
     (->> (map enlive->hiccup (:content el))
       (concat [(:tag el) (:attrs el)])
       (keep identity)
       vec)
     el))

(defn html->enlive 
  [html]
  (first (html-resource (StringReader. html))))

(defn html->hiccup [html]
  (-> html
      html->enlive
      enlive->hiccup))

=> (html->hiccup "<html><body id='foo'>hello</body></html>")
[:html [:body {:id "foo"} "hello"]]
zcaudate
  • 13,998
  • 7
  • 64
  • 124
  • For example... if I was working with a designer who gave me a bunch of html files... i would have to 'translate' it by hand... most web tooling in general don't output hiccup structures and its a hassle to do anything with the html output if i'm working with hiccup... this way I can put it in the 'translator' and get the code i need. – zcaudate Jun 19 '12 at 05:49
  • @zcaudate Heretic question: why don't you use enlive, then? – kotarak Jun 19 '12 at 14:32
  • @kotarak Its a preference and a workflow thing... Essentially, I found that my brain's not fast enough to switch back and forth between html and clojure when I'm tweaking stuff. All my views and templates readily accessible in one big file to cut/paste/insert - instead of splitting off into html and code. And its nice to work with in clojurescript with the hiccup equivalent - crate. – zcaudate Jun 19 '12 at 18:19

4 Answers4

8

You could html-resource from enlive to get a structure like this:

{:tag :html :attrs {} :content []}

Then traverse this and turn it into a hiccup structure.

(defn html->hiccup
   [html]
   (if-not (string? html)
     (->> (map html->hiccup (:content html))
       (concat [(:tag html) (:attrs html)])
       (keep identity)
       vec)
     html))

Here a usage example:

user=>  (html->hiccup {:tag     :p
                       :content ["Hello" {:tag     :a
                                          :attrs   {:href "/foo"}
                                          :content ["World"]}
                                 "!"]})
[:p "Hello" [:a {:href "/foo"} "World"] "!"]
kotarak
  • 17,099
  • 2
  • 49
  • 39
  • thanks! I tried looking at enlive before but was confused by the fact that it takes a file as input. Is there anyway to input a string in enlive as opposed to a resource? – zcaudate Jun 19 '12 at 05:51
  • 1
    I'd expect you can define a simple helper function: `(defn str-resource [s] (html-resource (StringReader. s)))`. Not tested. – kotarak Jun 19 '12 at 06:28
  • There are better answers now that libraries exist; see [below](https://stackoverflow.com/a/26006907/63009) – Stuart Sierra Jun 09 '17 at 19:31
7

There is a page on the Hiccup Github Wiki:

https://github.com/weavejester/hiccup/wiki/Converting-html-to-hiccup

which links to three solutions:

https://github.com/davidsantiago/hickory

https://github.com/nathell/clj-tagsoup

https://github.com/hozumi/hiccup-bridge

(Oddly, I found this question and that wiki page in the same search just now... and I was the most recent editor of that Wiki page, 2 years ago.)

Kyle Cordes
  • 910
  • 1
  • 8
  • 6
3

There is now Hickory which does this: https://github.com/davidsantiago/hickory

John
  • 14,944
  • 12
  • 57
  • 57
0

There is this snippet of code that I wrote, which (unlike hickory) runs truly cross platform without relying on the browser:

(ns hiccdown.html
  (:require [clojure.edn :as edn]
            [instaparse.core :as insta :refer [defparser]]))

(defparser html-parser "
  nodes = node*
  <node> = text | open-close-tags | self-closing-tag
  open-close-tags = opening-tag nodes closing-tag
  opening-tag = <'<'> <spaces>? tag-name attributes? <spaces>? <'>'>
  closing-tag = <'</'> tag-name <'>'>
  self-closing-tag = <'<'> <spaces>? tag-name attributes? <spaces>? <'/>'>
  tag-name = #'[^ </>]+'
  attributes = (<spaces> attribute)+
  attribute = attribute-name (<'='> attribute-value)?
  <attribute-name> = #'[^ \t=]+'
  <attribute-value> = #'[^ \t]+' | #'\"[^\"]*\"'
  <text> = #'[^<]+'
  spaces = #'[ \t]+'
")

(defn html->hiccup [html-str]
  (->> (html-parser html-str)
       (insta/transform {:nodes            (fn [& nodes] nodes)
                         :open-close-tags  (fn [opening-tag nodes _closing-tag]
                                             (into opening-tag nodes))
                         :opening-tag      (fn ([tag-name] [tag-name])
                                               ([tag-name attributes] [tag-name attributes]))
                         :self-closing-tag (fn ([tag-name] [tag-name])
                                               ([tag-name attributes] [tag-name attributes]))
                         :tag-name         keyword
                         :attributes       (fn [& attributes]
                                             (into {} attributes))
                         :attribute        (fn ([attribute-name]
                                                [(keyword attribute-name) true])
                                               ([attribute-name attribute-value]
                                                [(keyword attribute-name) (edn/read-string attribute-value)]))})))

Update: This snippet became a Clojure lib.

Vincent Cantin
  • 16,192
  • 2
  • 35
  • 57