0

I have simple XML (rss feed)

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>title</title>
  <link rel="self" href="http://ushakova.city/articles/feed/"/>
  <updated>2019-11-04T12:45:00Z</updated>
  <id>http://myurl/articles/feed/?dt=2019-11-04T12:45:00Z</id>
  <entry>
    <id>http://myurl/articles/117/</id>
    <link rel="alternate" type="text/html" href="https://yandex.ru/news/story/V_Tyumenskom_rajone_vybirayut_luchshego_pomoshhnika_vospitatelya--5d543025944f00fecce8c11031573354?lang=ru&amp;amp;from=rss&amp;amp;stid=cF2eQWK2DdnB"/>
    <author>
      <name/>
    </author>
    <published>2019-11-04T12:45:00Z</published>
    <updated>2019-11-04T12:45:00Z</updated>
    <title type="html"><![CDATA[В Тюменском районе выбирают лучшего помощника воспитателя]]></title>
    <content type="html"><![CDATA[]]></content>
  </entry>
</feed>

validate

xmlstarlet val _1.xml 
_1.xml - valid

see structure

xmlstarlet el _1.xml 

feed
feed/title
feed/link
feed/updated
feed/id
feed/entry
feed/entry/id
feed/entry/link
feed/entry/author
feed/entry/author/name
feed/entry/published
feed/entry/updated
feed/entry/title
feed/entry/content

try remove section

xmlstarlet ed -d "//entry" _1.xml
xmlstarlet ed -d "/feed/entry" _1.xml

and nothing.. I have few questions.

  1. what i do wrong?
  2. How remove section?
  3. How remove section if link/@href do not started with http://myurl
Anton Shevtsov
  • 1,279
  • 4
  • 16
  • 34
  • 1
    You need to take the `http://www.w3.org/2005/Atom` namespace into account. See https://stackoverflow.com/q/44186213/407651 – mzjn Nov 06 '19 at 11:56
  • 1
    @mzjn xmlstarlet ed -N WHAT-PREFIX-HERE=http://www.w3.org/2005/Atom ? – Anton Shevtsov Nov 06 '19 at 12:06
  • 1
    There is a simplified syntax for the default namespace (see answer to linked question). This should work for you: `xmlstarlet ed -d "//_:entry" _1.xml`. – mzjn Nov 06 '19 at 12:08
  • @mzjn Tnanks! Worked! And last question. How delete entry node if link@href != http://myurl. May be use regexp? Its possible ? – Anton Shevtsov Nov 06 '19 at 12:13
  • Use similar approach to what's suggested in [my answer](https://stackoverflow.com/questions/58706146/bash-remove-xml-nodes-if-the-attribute-value-of-a-child-node-does-not-equal-a/58728029#58728029) to your last question, i.e. Use the `-N` option to bind the namespace to an arbitrarily named `x` prefix. Then reference the elements using `x:feed` and `x:entry`. For instance: `xml ed -N x="http://www.w3.org/2005/Atom" -d "/x:feed/x:entry" _1.xml` - See [Common problems: Namespaces and default namespace](http://xmlstar.sourceforge.net/doc/UG/ch05s01.html) in docs. Or use a `local-name()` function. – RobC Nov 06 '19 at 12:58
  • @AntonShevtsov - Your point no. 3, i.e. _"How remove [sic] section if link/@href do not started with `http://myurl`"_ - How does that differ from what you asked in your [previous question](https://stackoverflow.com/questions/58706146/bash-remove-xml-nodes-if-the-attribute-value-of-a-child-node-does-not-equal-a/58728029) ? What _"section"_ are you referring to this time? – RobC Nov 06 '19 at 13:34
  • @RobC thanks! I comment your answer in my previous question – Anton Shevtsov Nov 07 '19 at 12:13
  • 1
    Does this answer your question? [Bash - Remove XML nodes if the attribute value of a child node does not equal a specific value?](https://stackoverflow.com/questions/58706146/bash-remove-xml-nodes-if-the-attribute-value-of-a-child-node-does-not-equal-a) – Anton Shevtsov Nov 14 '19 at 04:22

0 Answers0