4

I'm new to clojure and have been working with enlive to transform text nodes of html documents. My end goal is to convert the structure back into html, tags and all.

I'm currently able to take the structmap returned by enlive-html/html-resource and transform it back to html using

(apply str (html/emit* nodes))

where nodes is the structmap.

I'm also able to transform the structmap's :content text nodes as I wish. However, after transforming the content text nodes of the structmap, I end up with a lazyseq of MapEntries. I want to transform this back into a structmap so I can use emit* on it. This is a little tricky because the lazyseqs & structmaps are nested.

tldr:

How do I transform:

([:tag :html]
 [:attrs nil]
 [:content
  ("\n"
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n  "
      ([:tag :title] [:attrs nil] [:content ("Page Title")])
      "  \n")])
   "\n"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n  "
      ([:tag :div]
       [:attrs {:id "wrap"}]
       [:content
        ("\n    "
         ([:tag :h1] [:attrs nil] [:content ("header")])
         "\n    "
         ([:tag :p] [:attrs nil] [:content ("some paragrah text")])
         "\n  ")])
      "\n")])
   "\n\n")])

into:

    {:tag :html,
 :attrs nil,
 :content
 ("\n"
  {:tag :head,
   :attrs nil,
   :content
   ("\n  " {:tag :title, :attrs nil, :content ("Page Title")} "  \n")}
  "\n"
  {:tag :body,
   :attrs nil,
   :content
   ("\n  "
    {:tag :div,
     :attrs {:id "wrap"},
     :content
     ("\n    "
      {:tag :h1, :attrs nil, :content ("header")}
      "\n    "
      {:tag :p, :attrs nil, :content ("some paragrah text")}
      "\n  ")}
    "\n")}
  "\n\n")}

Update

kotarak's response pointed me in the direction of update-in, which I was able to use to modify the map in place without transforming it to a sequence, thus rendering my question irrelevant.

(defn modify-or-go-deeper
  "If item is a map, updates its content, else if it's a string, modifies it"
  [item]
  (declare update-content)
  (cond
    (map? item) (update-content item)
    (string? item) (modify-text item)))

(defn update-content
  "Calls modify-or-go-deeper on each element of the :content sequence"
  [coll]
  (update-in coll [:content] (partial map modify-or-go-deeper)))

I was using for on the map before, but update-in is the way to go.

Community
  • 1
  • 1
jmw
  • 327
  • 1
  • 2
  • 14

2 Answers2

4

Just put everything back into a map and walk the content recursively.

(defn into-xml
  [coll]
  (let [tag (into {} coll)]
    (update-in tag [:content] (partial map into-xml))))

Note that the content is only transformed as you access it.

Edit: Woops, missed the string parts. Here a working version:

(defn into-xml
  [coll]
  (if-not (string? coll)
    (let [tag (into {} coll)]
      (update-in tag [:content] (partial map into-xml)))
    coll))
kotarak
  • 17,099
  • 2
  • 49
  • 39
  • Thanks for this. This almost seems to do the trick, except once it gets into the content sequence it errors. Passing my collection into this function, i get: `{:tag :html, :attrs nil, :content (IllegalArgumentException Don't know how to create ISeq from: java.lang.Character clojure.lang.RT.seqFrom (RT.java:487)` I'm working with your solution now to see if I can figure out what's going on. – jmw Jun 14 '12 at 19:37
  • Hey, so as I said, I'm new to clojure. And your solution pointed me in the direction of `update-in`, which I was able to use on the original collection, instead of `for`, and thus I retained the map structure, instead of transforming into a sequence of MapEntries. I put my solution to how I walk through the collection in the end of my question. Thank you! – jmw Jun 15 '12 at 03:25
  • @jmw Ah, yes. The strings. I added a fixed version, but your solution is of course better, to use `update-in` in the first place. Note though, that the `declare` should be at the toplevel. It makes no sense to put it into a function. – kotarak Jun 15 '12 at 05:39
1

Try

(def mp '([:tag :html] [:attrs nil] [:content
    (""
    ([:tag :head] [:attrs nil] [:content
        ("\n\t\t"
        ([:tag :title] [:attrs nil] [:content ("page title")])
        "\n\t\t")])
        "\n\t"
        ([:tag :body] [:attrs nil] [:content
            ("\n\t\t"
            ([:tag :div] [:attrs {:id "wrapper"}] [:content
            ("\n\t\t  "
            ([:tag :h1] [:attrs nil] [:content
                ("\n  \t\t\tpage title"
                ([:tag :br] [:attrs nil] [:content ()])
                "\n  \t\t\tand more title\n  \t\t")])
                "\n  \t\t"
                ([:tag :p] [:attrs nil] [:content
                    ("\n  \t\tSome paragraph text"
                    ([:tag :img] [:attrs {:src "images/image.png", :id "image"}] [:content nil])
                    "\n  \t\t")])
            "\n\t\t")]
            "\n\t     \n\t\t"))]
        "\n\n"))]))

(clojure.walk/postwalk (fn [x]
                         (if (and (list? x) (vector? (first x)))
                           (into {} x)
                           x))
                       mp)

It will throw an error, but if you change your input to

([:tag :html]
 [:attrs nil]
 [:content
  (""
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :title] [:attrs nil] [:content ("page title")])
      "\n\t\t")])
   "\n\t"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :div]
       [:attrs {:id "wrapper"}]
       [:content
        ("\n\t\t  "
         ([:tag :h1]
          [:attrs nil]
          [:content
           ("\n  \t\t\tpage title"
            ([:tag :br] [:attrs nil] [:content ()])
            "\n  \t\t\tand more title\n  \t\t")])
         "\n  \t\t"
         ([:tag :p]
          [:attrs nil]
          [:content
           ("\n  \t\tSome paragraph text"
            ([:tag :img]
             [:attrs {:src "images/image.png", :id "image"}]
             [:content nil])
            "\n  \t\t")])
         "\n\t\t")]
       ))]))]))

then it works ok. The difference is that, in the edited input, you're removing the "\n\t\t"-like strings from the same list which contains your key-value pairs. Hope this helps.

Edit: The following worked for me:

(def mp '([:tag :html]
 [:attrs nil]
 [:content
  (""
   ([:tag :head]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :title] [:attrs nil] [:content ("page title")])
      "\n\t\t")])
   "\n\t"
   ([:tag :body]
    [:attrs nil]
    [:content
     ("\n\t\t"
      ([:tag :div]
       [:attrs {:id "wrapper"}]
       [:content
        ("\n\t\t  "
         ([:tag :h1]
          [:attrs nil]
          [:content
           ("\n  \t\t\tpage title"
            ([:tag :br] [:attrs nil] [:content ()])
            "\n  \t\t\tand more title\n  \t\t")])
         "\n  \t\t"
         ([:tag :p]
          [:attrs nil]
          [:content
           ("\n  \t\tSome paragraph text"
            ([:tag :img]
             [:attrs {:src "images/image.png", :id "image"}]
             [:content nil])
            "\n  \t\t")])
         "\n\t\t")]
       ))]))]))

(clojure.walk/postwalk (fn [x]
                         (if (and (list? x) (vector? (first x)))
                           (into {} x)
                           x))
                       mp)

Try copy and pasting it in a repl. You should get the following:

{:tag :html,
 :attrs nil,
 :content
 (""
  {:tag :head,
   :attrs nil,
   :content
   ("\n\t\t"
    {:tag :title, :attrs nil, :content ("page title")}
    "\n\t\t")}
  "\n\t"
  {:tag :body,
   :attrs nil,
   :content
   ("\n\t\t"
    {:tag :div,
     :attrs {:id "wrapper"},
     :content
     ("\n\t\t  "
      {:tag :h1,
       :attrs nil,
       :content
       ("\n  \t\t\tpage title"
        {:tag :br, :attrs nil, :content ()}
        "\n  \t\t\tand more title\n  \t\t")}
      "\n  \t\t"
      {:tag :p,
       :attrs nil,
       :content
       ("\n  \t\tSome paragraph text"
        {:tag :img,
         :attrs {:src "images/image.png", :id "image"},
         :content nil}
        "\n  \t\t")}
      "\n\t\t")})})}
higginbotham
  • 1,879
  • 4
  • 16
  • 14
  • java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassCastException: java.lang.Character cannot be cast to java.util.Map$Entry – higginbotham Jun 13 '12 at 23:44
  • Hey, I'm actually not getting an error, so it's possible I pasted in the input incorrectly. But this also doesn't seem to transform the collection at all, it's still returning the same lazyseq of MapEntries. Like I said, I'm new to clojure, so I'm not sure if I'm using this the right way, but I shoved it into a function, like: `(defn retransform [mp] (clojure.walk/postwalk (fn [x] (if (and (list? x) (vector? (first x))) (into {} x) x)) mp))` – jmw Jun 14 '12 at 19:28
  • Hmm it's working for me. I've updated my post with an example that you should be able to just copy and paste. – higginbotham Jun 15 '12 at 10:18