3

TLDR : I need to move an element as first child of another using XQuery

I'm working on an XML TEI critical edition (an alignment of three different versions of a text) and need to update my document to show the base version of the text. At the moment, my edition looks like:

<app corresp="#orth">
    <rdg wit="#atw">feust</rdg>
    <rdg wit="#brl">fust</rdg>
    <rdg wit="#brn">fut</rdg>
</app>

As you can see, the variations are signalled within an <app> element in which the versions of the text are encoded each in an <rdg> element.

What I need to do : transform the <rdg> element that has an attibute @wit="#brl" into a <lem> element, and move it as the first of the three elements in the <app>. So basically, transform the above example into:

<app corresp="#orth">
    <lem wit="#brl">fust</lem>
    <rdg wit="#atw">feust</rdg>
    <rdg wit="#brn">fut</rdg>
</app>

The document is pretty long, so I thought of automating the process using XQuery. However, I'm having troubles.

So far, I've managed to transform the <rdg> into a <lem>, using this query :

let $doc := db:open("#...")
let $brl := $rdg[contains(@wit, "#brl")]

for $el in $brl
return rename node $el as "lem"

Now, I need to move the <lem> as first child of <app>. That's the part with which I'm having trouble. All I've managed to do so far is to copy the <lem> as first child of <app>, but by only returning the <app> elements and not the entire document. Here's the query I used:

let $doc := db:open("#...")
let $app := $doc//app

for $el in $app
return 
  copy $target := $el
  modify (
    insert node $target/lem as first into $target
    )
  return $cible

The following steps I need to achieve are:

  • Managing to copy the <lem> as first child of <app>, but with returning the whole document (I tried to do this with an if...else, but with no success.
  • Deleting the <lem> elements that are not the first children of an <app> (the above request duplicates the <lem>, so that means we have two <lem> per <app>).

I don't have a lot of experience with XQuery, appart from a 10 hours of class, so a little bit of help would be highly appreciated ! Thanks so much in advance.

EDIT

Christian's answer (see code below) works, but only returns the modified elements, not the entire updated document:

return $app update {
  delete node ./lem,
  insert node ./lem as first into .
}

I would need to update the whole document with the updated elements. I haven't managed to export the document with the updates. Another thing I've tried is:

  • iterating through the whole document
  • if the elements are <app>, modifying and returning them
  • else, returning the unchanged elements:
for $el in $doc//*
if ($el = $app)
  return $app update {
    delete node ./lem,
    insert node ./lem as first into .
  }
else return $el

The above transaction has an obvious mistake I can't seem to get rid of : you can't just return an unchanged element in an else statement. The question now is: how can I update the whole document with the updated <app> ?

paulhector
  • 85
  • 7
  • 1
    I don't think XQuery update allows two update operations for the the same node in one "transaction", I think you need to separate the two "transactions". Thus if the `rename` transaction works as you want it, use second and third, separated transaction/query for the insertion and removal. – Martin Honnen Jan 12 '22 at 16:20
  • I'm aldready proceding like this. I'm looking to do 3 transactions : (1) `rename` the `` to ``, (2) `insert` the `` element as first child of ``, (3)`delete` the redundant `` element. I'm stuck on the second transaction transaction tho – paulhector Jan 13 '22 at 17:38

2 Answers2

3

It might be easier to perform the updates in two steps:

  1. rename the target nodes
  2. delete and reinsert the renamed nodes

Here’s one possible solution:

let $doc := document {
  <app corresp="#orth">
    <rdg wit="#atw">feust</rdg>
    <rdg wit="#brl">fust</rdg>
    <rdg wit="#brn">fut</rdg>
  </app>
}
let $updated1 := (
  copy $target := $doc
  modify (
    for $app in $target//app
    return rename node $app/rdg[@wit = '#brl'] as 'lem'
  )
  return $target
)
let $updated2 := (
  copy $target := $updated1
  modify (
    for $app in $target//app
    return (
      delete node $app/lem,
      insert node $app/lem as first into $app
    )
  )
  return $target
)
return $updated2

The query returns the following output:

<app corresp="#orth">
  <lem wit="#brl">fust</lem>
  <rdg wit="#atw">feust</rdg>
  <rdg wit="#brn">fut</rdg>
</app>

As you see in the second block, a node that’s deleted will be reinserted. That’s due to the semantics of XQuery Update: All update statements refer to the original XML node, and will eventually be executed in a defined order in a final step (look for Pending Update List to get more information).

As your query implies that you are using BaseX, I would recommend the use of the handy update expression, which comes with a more compact syntax. In addition, it allows you to chain multiple updates:

...
return $doc update {
  for $app in .//app
  return rename node $app/rdg[@wit = '#brl'] as 'lem'
} update {
  for $app in .//app
  return (
    delete node $app/lem,
    insert node $app/lem as first into $app
  )
}
Christian Grün
  • 6,012
  • 18
  • 34
  • Thanks for your answer ! I'm still facing a small issue though, please see the edit of my original post :) – paulhector Jan 14 '22 at 00:19
  • Do you want to update the document in your database, or do you want to return an updated version of the document as result of the query? – Christian Grün Jan 14 '22 at 07:48
  • If possible, return an updated version of the document, but both would be more than fine ! – paulhector Jan 14 '22 at 11:53
  • 1
    I have revised my answer. All updates will now be performed on the full document. – Christian Grün Jan 18 '22 at 16:33
  • thank you again for your help ! from my understanding, it should work. however, this new query outputs the whole document, without updating it. i don't really understand what's going on with this ! – paulhector Jan 21 '22 at 11:07
  • I have added examplary input and the result to my answer. – Maybe the initial question was a bit complex, and writing to the BaseX mailing list may be more productive. – Christian Grün Jan 21 '22 at 15:05
1

Here's an XSL Transform that accomplishes what you've specified.

It uses an identity transform where you only specify the things you want to change, and everything else is left unmodified.

I've updated your sample input to reflect its truer nature based on your reply in the comments below:

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <dont-care-about-this foo="foo">
    <leave-this-alone bar="bar">
      <app corresp="#orth">
        <!-- excuse the nonsensical values; point being: no brl, no action -->
        <rdg wit="#atw">fual</rdg>
        <rdg wit="#brn">fuall</rdg>
      </app>
      <app corresp="#orth">
        <!-- lem goes above -->
        <rdg wit="#atw">feust</rdg>
        <rdg wit="#brl #cas">fust</rdg>
        <rdg wit="#brn">fut</rdg>
      </app>
    </leave-this-alone>
  </dont-care-about-this>
</TEI>
  • I added the default namespace xmlns="the-TEI-URI".
  • I added some bogues parent and sibling nodes to the <app> you do care about to underscore the point that the identity transform only cares about what you specify, and everything you didn't explicity specify is implicity left as-is.

I still consider myself to be kind of a hack when it comes to XSLT, and nothing reinforces that idea more than my lack of understanding of the mechanics of namespaces in XSLT... they frustrate me, and they seem to frustrate a lot of other people who haven't made XPath and XSLT their lives. Still, here are some hacks/magical invocations^magic that work for the above XML. These are from the <xsl:stylesheet> declaration:

    xmlns="http://www.tei-c.org/ns/1.0"
    xmlns:tei="http://www.tei-c.org/ns/1.0"
    exclude-result-prefixes="tei"
  • xmlns="http://www.tei-c.org/ns/1.0": this ensures new nodes you add, like <lem>, don't look like <lem xmlns="">
  • xmlns:tei="http://www.tei-c.org/ns/1.0": This defines a namespace that all XPath expressions in the XSLT need to reference; else XPath assumes "an unprefixed name" has a namespace with a blank URI, and all your node names have the unblank URI the-TEI-URI ^xpath
  • exclude-result-prefixes="tei": this ensures new nodes you add, like <lem>, don't look like this <lem xmlns:tei="the-TEI-URI">^exclude

With all those in place, all the node names in the XPaths now need the prefix tei:, like tei:app[tei:rdg[contains(@wit, 'brl')]]. The attributes don't, because the attributes in your source XML are unprefixed.

wit_brl.xsl

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="http://www.tei-c.org/ns/1.0"
    xmlns:tei="http://www.tei-c.org/ns/1.0"
    exclude-result-prefixes="tei"
    >
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes" />

    <!-- Start identity transform -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()" />
        </xsl:copy>
    </xsl:template>

    <!-- match any app with wit-brl -->
    <xsl:template match="tei:app[tei:rdg[contains(@wit, 'brl')]]">
        <xsl:copy>
            <!-- copy app's attribs -->
            <xsl:apply-templates select="@*" />
            <!-- select wit-brl rdg for move -->
            <xsl:apply-templates select="tei:rdg[contains(@wit, '#brl')]" mode="move" />
            <!-- copy *all* rdgs (including wit-brl rdg), except... -->
            <xsl:apply-templates />
        </xsl:copy>
    </xsl:template>

    <!-- ...when wit-brl rdg is found w/out "move" mode, discard it -->
    <xsl:template match="tei:rdg[contains(@wit, '#brl')]" />

    <!-- When wit-brl rdg is found with "move" mode, rename/move -->
    <xsl:template match="tei:rdg[contains(@wit, '#brl')]" mode="move">
        <lem>
            <xsl:apply-templates select="@*|node()" />
        </lem>
    </xsl:template>

</xsl:stylesheet>

Now when I run:

xsltproc wit_brl.xsl input.xml | tidy -q -i -xml --indent-spaces 2

I get:

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <dont-care-about-this foo="foo">
    <leave-this-alone bar="bar">
      <app corresp="#orth">
        <!-- excuse the nonsensical values; point being: no brl, no action -->
        <rdg wit="#atw">fual</rdg>
        <rdg wit="#brn">fuall</rdg>
      </app>
      <app corresp="#orth">
        <lem wit="#brl #cas">fust</lem>
        <!-- lem goes above -->
        <rdg wit="#atw">feust</rdg>
        <rdg wit="#brn">fut</rdg>
      </app>
    </leave-this-alone>
  </dont-care-about-this>
</TEI>
Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • hi, thank you so much for your answer ! indeed, XSL would be perfect. but unfortunately, I've never used it before (I'll have a class on it from monday onwards). **two things** : first, I tested your XSL but it didn't change the document, is there something wrong with the XSL stylesheet? second, my target isn't `rdg[@wit = '#brl']` but `rdg[contains(@wit, 'brl')]` (sometimes, a single `rdg` is used for more than one witness, and I would need to update all of the `rdg` which contain a `brl` witness). – paulhector Jan 21 '22 at 10:58
  • @paulhector, thanks for your response and trying it out. I've updated my answer with "contains" logic, and I've also included how I'm using the transform, 'cause I'm not sure why you're not seeing a difference... how "deep" is the `` node? Do its parents have any namespaces? – Zach Young Jan 21 '22 at 18:16
  • 1
    thank you for your updated and commented answer, i really appreciate it ! i tried it out with your test input and it worked perfectly . however, i tried it wit my own file which is much larger (2000 lines) and much more complex, and it did not work. the file is using only the TEI namespace, but with quite a complex structure, nested `app`s and so on. i figure the added complexity complicate the XSLT (or XQuery) transforms. i ended up transforming the file by hand. – paulhector Jan 24 '22 at 09:37
  • 1
    Wow, 2000 lines, I'm figuring something like three to five hundred ``'s by hand! That's rough. Yeah, namespaces can make an XSLT that seems 100% correct act 0% correct. I poked around the TEI website and didn't find anything super helpful there about XSLT and namespaces, but I did find the namespace URI they use. I added that my sample input and added some more structure. The XSLT now works and is hopefully more representative. Here's hoping you don't have to do that much work by hand in the future. Bring me in to a discussion next time, I'd like to help. – Zach Young Jan 24 '22 at 17:50
  • Thank you so much for updating it ! It was a bit long to do, but after 2 weeks of trying to write a script to automatize the process, I gave up... I'll be sure to try it out :) – paulhector Feb 03 '22 at 16:56