2

I need to convert the metadata in one xml document into Dublin Core metadata in another xml document. Here's the first xml document:

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns:xlink="http://www.w3.org/1999/xlink">

<description>
    <title>Letter from Waldemar Schultze to Jennie Schultze</title>
    <creator type="author">
        <name type="personal">Schultze, Waldemar</name>
    </creator>
    <date>1943-06-30</date>
    <source>Special Collections and University Archives, W. E. B. Du Bois Library,
        University of Massachusetts Amherst.</source>
    <citation>Robert and Waldemar Schultze Papers (MS 528). Special Collections and
        University Archives, W.E.B. Du Bois Library, University of
        Massachusetts Amherst.</citation>
</description>

<text>

    <header type="letterhead">

        <imageGroup>
            <image xlink:href="mums528-i001-001.png"/>
            <caption>page 1</caption>
        </imageGroup>
        <imageGroup>
            <image xlink:href="mums528-i001-002.png"/>
            <caption>page 2</caption>
        </imageGroup>

        <organization>Unites States Disciplinary Barracks</organization>
        <location>Fort Leavenworth, Kansas</location>
        <date format="M/DD/YY">6/30/43</date>
        <recipient>
            <name type="personal">Mrs. W.J. Schultze</name>
            <address>875 Richmond Av., Buffalo, N.Y.</address>
            <relation>Mother</relation>
        </recipient>
    </header>
    <body>
        <salutation>Dear Mother,</salutation>
        <p><line>This is the first letter I have had</line> 
            <line>an opportunity to write you since leaving Fort</line> 
            <line>Jay, and I know you must be anxious to hear from me.</line></p>
        <p><line>Bob and I are both feeling as well as</line> 
            <line>can be expected considering our present cir-</line>
            <line>cumstances. We both have high blood</line>
            <line>pressure, mine has been 160/100 for the past</line> 
            <line>2 days, and Bob's 158/96, but my sinus</line>
            <line>infection has not caused me quite so much</line> 
            <line>trouble since leaving N.Y. State. I believe</line> 
            <line>the air is dryer here and is responsible</line> 
            <line>for any alleviation that has taken place.</line></p>
        <p><line>While a prisoner here remains in their</line> 
            <line>so-called 1st grade, he is able to write</line> 
            <line>twice a week, in second grade once a week,</line> 
            <line>and in third grade once a month. These</line> 
            <line>grades refer to classifications that ostensibly</line>
            <line>are for conduct while here.  It is quite possible</line> 
            <line>to lose a conduct rating, as I understand it,</line> 
            <line>by not having a perpetually rusting tin cup polished</line> 
            <pb n="2"/>
            <line>brightly for daily inspection, although the tin plating long ago dis-</line>
            <line>appeared and the cup is rusty again within 2 hours after wetting.</line></p>
        <p><line>The food here is good and is well-cooked,</line> 
            <line>with one exception, the gravy, which is nothing but</line> 
            <line>flour, water, and bacon grease, Strangely enough, how-</line>
            <line>ever, no condiments, not even salt, are provided on</line> 
            <line>the table, to the detriment of otherwise very good</line> 
            <line>meals.  While meat here is unrationed and is plentiful,</line> 
            <line>toilet paper; believe it or not, is rationed.  A</line> 
            <line>5¢ roll must last a prisoner 45 days, or else -- ?</line>
            <line>Perhaps, however, a prisoner can purchase additional</line> 
            <line>if it should be necessary.</line></p>
        <p><line>Please see that my subscriptions are transferred</line> 
            <line>here as soon as possible from Fort Jay. Give Florence</line> 
            <line>and Helen my regards, and thank Joe for his</line> 
            <line>efforts in my behalf in managing my business.</line> 
            <line>Find out from Joe how tube deliveries are at the</line> 
            <line>present time, first to satisfy my curiosity; and</line> 
            <line>also let me know if you are receiving your</line>
            <line>remittance regularly from him.  If he is not</line> 
            <line>taking care of your support in accordance with</line> 
            <line>the instructions I left him, I wish to know it,</line> 
            <line>so I can write, and correct the matter.</line>  
            <line>You can tell Joe to subscribe to Electronics</line> 
            <line>magazine for me and send it to this address</line>
            <line>direct from the publisher.  He should also have a copy</line> 
            <line>of Palmer's "Calculus for Home Study," sent me by the publisher,</line> 
            <line>whose name he can obtain from Ulbrichs.  In future letters I'll</line> 
            <line>copy the "Prisoner's Handbook" issued here, and the</line> 
            <line>contents of the detached letter form stub.</line> 
        </p>
        <valediction>Love, Waldemar</valediction>
    </body>
</text>

Since I only need a few of the elements to be represented in the output xml, my xsl only uses a few templates, and the rest I've left empty:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://relaxng.org/ns/structure/1.0">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
    <dc><xsl:apply-templates select="document"></xsl:apply-templates></dc>
</xsl:template>

<xsl:template match="description/creator/name">
    <creator><xsl:value-of select="."/></creator>
</xsl:template>

<xsl:template match="description/title">
    <title><xsl:value-of select="."/></title>
</xsl:template>

<xsl:template match="description/date">
    <date><xsl:value-of select="."/></date>
</xsl:template>

<xsl:template match="description/source">
    <publisher><xsl:value-of select="."/></publisher>
</xsl:template>

    <source></source>
    <description></description>
    <subject></subject>
    <coverage></coverage>
    <contributor></contributor>
    <identifier></identifier>
    <relation></relation>
    <rights></rights>
    <language></language>
    <type></type>
    <format></format>

The document validates, and the correct metadata appears in the output Dublin Core elements. The problem is, the rest of the source xml-- which I don't need-- also appears after the <publisher> element. How can I make all that other text disappear from the output?

Misenus
  • 139
  • 10

2 Answers2

2

That's due to the built-in templates processing all elements not matched by any of your custom templates. You can add the following template to fix it with minimal changes to your current XSL :

<xsl:template match="*">
    <xsl:apply-templates select="*"/>
</xsl:template>

For reference : Why does XSLT output all text by default?

Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137
2

Or just do simply:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns="http://relaxng.org/ns/structure/1.0">
<xsl:output method="xml" indent="yes"/>

<xsl:template match="/document">
    <dc>
        <creator><xsl:value-of select="description/creator/name"/></creator>
        <title><xsl:value-of select="description/title"/></title>
        <date><xsl:value-of select="description/date"/></date>
        <publisher><xsl:value-of select="description/source"/></publisher>
        <source/>
        <description/>
        <subject/>
        <coverage/>
        <contributor/>
        <identifier/>
        <relation/>
        <rights/>
        <language/>
        <type/>
        <format/>
    </dc>
</xsl:template>

</xsl:stylesheet>

Your multitude of templates serves no purpose here other than make the stylesheet less readable (this is assuming each description element appears only once).

Note that the empty literal result elements must be inside a template in order to appear in the output.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • This works, but is there a reason (besides cleaner code) to use it rather than the catch-all template suggested by har07? – Misenus Aug 02 '15 at 11:27
  • @Misenus You should ask the opposite question: is there a good reason to apply templates to elements that you do **not** want, then process them by an empty template? The answer to this is of course not. Even with your approach, you could have applied templates only to the 4 elements that you do want. -- Note that you do not have a template matching `document` - thus the built-in template rules apply templates to **all** descendants of `document`. – michael.hor257k Aug 02 '15 at 11:50