3

I am trying to get scala xml node tag with attribute. I would like to get just the tag name with attribute and not the child elements.

I have this input:

<substance-classes>
    <nucleic-acid-sequence display-name="Nucleic Acid Sequence">
        <nucleic-acid-base>
            <base-symbol>a</base-symbol>
            <count>295</count>
        </nucleic-acid-base>
        <nucleic-acid-base>
            <base-symbol>c</base-symbol>
            <count>329</count>
        </nucleic-acid-base>
        <nucleic-acid-base>
            <base-symbol>g</base-symbol>
            <count>334</count>
        </nucleic-acid-base>
        <nucleic-acid-base>
            <base-symbol>t</base-symbol>
            <count>268</count>
        </nucleic-acid-base>
    </nucleic-acid-sequence>
    <genbank-information>
        <genbank-accession-number>EU186063</genbank-accession-number>
    </genbank-information>
</substance-classes>

I am trying to replace the contents of <nucleic-acid-sequence> by doing this

val newNucleicAcidSequenceNode = <nucleic-acid-sequence>{ myfunction 
} </nucleic-acid-sequence>

But some <nucleic-acid-sequence> has attributes like <nucleic-acid- sequence display-name="Nucleic Acid Sequence">. Since my newNucleicAcidSequenceNode is a hardcoded tag I am losing the attibutes.

How do I retain the optional attributes and still pass { myfunction } to <nucleic-acid-sequence> tag?

ashawley
  • 4,195
  • 1
  • 27
  • 40
nancy
  • 55
  • 4

1 Answers1

1

So, if I have understood you well:

  • you want to replace just a part of your xml
  • this part are the children of any nucleic-acid-sequence under substance-classes
  • you don't want to lose any attributes of any foresaid nucleic-acid-sequence
  • changing these foresaid children is done by a function ( myFunction)

So my answer would be in that case:

import scala.xml.{Node, Elem}

val myXml: Elem =
      <substance-classes>
        <nucleic-acid-sequence display-name="Nucleic Acid Sequence">
          <nucleic-acid-base>
            <base-symbol>a</base-symbol>
            <count>295</count>
          </nucleic-acid-base>
          <nucleic-acid-base>
            <base-symbol>c</base-symbol>
            <count>329</count>
          </nucleic-acid-base>
          <nucleic-acid-base>
            <base-symbol>g</base-symbol>
            <count>334</count>
          </nucleic-acid-base>
          <nucleic-acid-base>
            <base-symbol>t</base-symbol>
            <count>268</count>
          </nucleic-acid-base>
        </nucleic-acid-sequence>
        <genbank-information>
          <genbank-accession-number>EU186063</genbank-accession-number>
        </genbank-information>
      </substance-classes>

def myFunction(children: Seq[Node]) : Seq[Node] = ??? // whatever you want it to be

// Here's the replacement:

myXml.copy(child = myXml.child.map {
  case e@Elem(_, "nucleic-acid-sequence", _, _, children@_*) =>
    e.asInstanceOf[Elem].copy(child = myFunction(children))
  case other => other
})

For instance, myFunction could keep only children which have a count above 300 and could be something like:

import scala.util.{ Try, Success }
def myFunction(children: Seq[Node]): Seq[Node] = children.collect {
  case e: Node if Try((e \ "count").text.toInt > 300) == Success(true) =>
  e
}

In that case, if you replace the unimplemented myFunction in the first snippet by this, the replacement would give:

  <substance-classes>
    <nucleic-acid-sequence display-name="Nucleic Acid Sequence"><nucleic-acid-base>
        <base-symbol>c</base-symbol>
        <count>329</count>
      </nucleic-acid-base><nucleic-acid-base>
        <base-symbol>g</base-symbol>
        <count>334</count>
      </nucleic-acid-base></nucleic-acid-sequence>
    <genbank-information>
      <genbank-accession-number>EU186063</genbank-accession-number>
    </genbank-information>
  </substance-classes>

As you can see no attributes of nucleic-acid-sequence is lost and your function has kept two nodes over four for a defined condition.

Hope it helps.