67

I can not find any info on how to parse xml documents and access elements.

I have found two ways to parse the xml document

(clojure.zip/xml-zip (clojure.xml/parse file))

and

(parse-seq file)

but i can seem to find any info on how to process the resulting structure?

Source file's refers to zip-query.clj on how to query the result but that seems to missing too.

Hamza Yerlikaya
  • 49,047
  • 44
  • 147
  • 241
  • 2
    The examples from zip-query.clj can be found in xml_test.clj which is located in src/test/clojure/clojure/data/zip in the github repo for clojure.data.zip – Gavilan Comun Apr 14 '12 at 07:30
  • Funny, I asked this as well, and got excellent answers from some of the most helpful people on SO. However, even after running the resulting parsed data.xml through one of the suggestions, the resultant structure still does not make a lot of sense to me. I'm going to look at your xml-zip, unless data.xml is its successor. – octopusgrabbus Jun 29 '12 at 12:05
  • also see [`clojure.data.xml`](https://github.com/clojure/data.xml) – Jeremy Field May 09 '20 at 01:15

2 Answers2

83

Suppose you have the following xml to parse in your file:

<high-node>
   <low-node>my text</low-node>
</high-node>

you load clojure.xml:

user=> (use 'clojure.xml)

when parsed, the xml will have the following structure:

{:tag :high-node, :attrs nil, :content [{:tag :low-node, :attrs nil, :content ["my text"]}]}

and then you can seq over the content of the file to get the content of the low-node:

user=> (for [x (xml-seq 
              (parse (java.io.File. file)))
                 :when (= :low-node (:tag x))]
         (first (:content x)))

("my text")

Similarly, if you wanted to have access to the entire list of information on low-node, you would change the :when predicate to (= (:high-node (:tag x))):

user=> (for [x (xml-seq 
              (parse (java.io.File. file)))
                 :when (= :high-node (:tag x))]
         (first (:content x)))

({:tag :low-node, :attrs nil, :content ["my text"]})

This works because the keywords can operate as functions. See Questions about lists and other stuff in Clojure and Data Structures: Keywords

Community
  • 1
  • 1
Pinochle
  • 5,515
  • 2
  • 26
  • 20
  • Very nice explanation! I'm going to try this. – Ralph Nov 25 '10 at 13:52
  • (disclaimer: I'm a clojure newb)... But I found this worked for me in the REPL, and I couldn't get it to work in a file (my ignorance is to blame). The clojure.data.zip.xml worked for me in a file without modification. – wonderfulthunk Apr 30 '12 at 18:04
  • Often, things that work in the REPL and not in a file are "lazily evaluated" things... Try this [section in the tutorial](http://java.ociweb.com/mark/clojure/article.html#Sequences). Short version: put `(doall ...)` around the `(for)`. – rescdsk Aug 08 '12 at 04:20
57

The above answer works, but I find it a lot easier to use clojure.data.zip.xml (used to be clojure-contrib.zip-filter.xml prior to Clojure 1.3).

file:

myfile.xml:

<songs>
  <track id="t1"><name>Track one</name></track>
  <track id="t2"><name>Track two</name></track>
</songs>

code:

; Clojure 1.3
(ns example
  (:use [clojure.data.zip.xml :only (attr text xml->)]) ; dep: see below
  (:require [clojure.xml :as xml]
            [clojure.zip :as zip]))

(def xml (xml/parse "myfile.xml"))
(def zipped (zip/xml-zip xml))
(xml-> zipped :track :name text)       ; ("Track one" "Track two")
(xml-> zipped :track (attr :id))       ; ("t1" "t2")

Unfortunately, you need to pull in a dependency on data.zip to get this nice read/filter functionality. It's worth the dependency :) In lein it would be (as of 17-Aug-2013):

[org.clojure/data.zip "0.1.1"]

And as for docs for data.zip.xml ... I just look at the relatively small source file here to see what is possible. Another good SO answer here, too.

Community
  • 1
  • 1
overthink
  • 23,985
  • 4
  • 69
  • 69
  • 3
    I don't understand why (xml-> zipped :track :name text) will work, but (xml-> zipped :songs :track :name text) wont work or (xml-> zipped :name text) wont work. Not sure why you have to specify a certain level of nestedness of the tags but not others. –  Jun 05 '13 at 02:50
  • 1
    @RyanMoore it's a [zipper](http://www.haskell.org/haskellwiki/Zipper), zippers are context sensitive in that they have a current node, and you must hand them traversal instructions appropriately relative to that context node. Apparently default context node is root which makes sense. – Jimmy Hoffa Sep 19 '13 at 20:59