7

I have several XML files which have a similar structure but with some differences that I cannot overlook. They are all TEI documents.

I am looking for a way to outline the main structure.

Take the following text as an example:

<text xmlns="http://www.tei-c.org/ns/1.0" xml:id="d1">
<body xml:id="d2">
<div1 type="book" xml:id="d3">
<head>Songs of Innocence</head>
<pb n="4"/>
<div2 type="poem" xml:id="d4">
<head>Introduction</head>
<lg type="stanza">
<l>Piping down the valleys wild, </l>
<l>Piping songs of pleasant glee, </l>
<l>On a cloud I saw a child, </l>
<l>And he laughing said to me: </l>
</lg>

I would like to suppress the nodes of the same type and all the repeating structures:

<body xml:id="d2">
<div1 type="book" xml:id="d3">
<head>Songs of Innocence</head>
<pb n="4"/>
<div2 type="poem" xml:id="d4">
<head>Introduction</head>
<lg type="stanza">
<l>...</l>
</lg>
<lg>...</lg>

So, basically I want to reduce the XML document to its most basic structure. In this way I can figure out how to properly convert them using XSLT.

kjhughes
  • 106,133
  • 27
  • 181
  • 240
Angelo
  • 767
  • 1
  • 6
  • 21

2 Answers2

2

Here are some options for viewing your XML in a tree structure:

  1. Open the XML in a web browser and get an outline view with collapsible elements.
  2. Open the XML in graphics view in Oxygen, QTAssistant, or XMLSpy.
  3. Use Graphviz or DotML ant build to create your own visual representations.

Note, however, that you'll need to clean up your markup. What you show doesn't qualify as XML as it's missing end tags and lacks a single root element. (XML has to be well-formed.)

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240
2

Using perl XML::DT, (apt-get install libxml-dt-perl if not installed), the command mkxmltype file.xml returns a compact description of the xml structure. Example

$ mkxmltype -lines=1000  a.xml 

# text ...Fri Feb 26 17:56:24 2016
text    =>  body * xml:id
body    =>  div1 * xml:id
div1    =>  tup(div2, pb, head) * type * xml:id
div2    =>  tup(head, lg) * type * xml:id
pb  =>  empty * n
head    =>  text
lg  =>  seq(l) * type
l   =>  text
JJoao
  • 4,891
  • 1
  • 18
  • 20
  • This tool has the problem of not being able to parse files with escaped chars. I get lots of `c.xml:882: parser error : Entity 'eacute' not defined Perché` – Angelo Feb 28 '16 at 17:18