0

I have an XML file similar like the following:

<?xml version="1.0" encoding="UTF-8"?>
<csw:GetRecordByIdResponse xmlns:csw="http://www.opengis.net/cat/csw/2.0.2">
  <xmlns:gmi="http://sdi.eurac.edu/metadata/iso19139-2/schema/gmi" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml" xmlns:geonet="http://www.fao.org/geonetwork" gco:isoType="gmd:MD_Metadata">
    <gmd:onLine>
                  <gmd:CI_OnlineResource>
                    <gmd:linkage>
                      <gmd:URL>http://server.test.it/geoserver/test_product/wms?SERVICE=WMS&amp;TIME=2018-06-14T10:59:00Z&amp;</gmd:URL>
                    </gmd:linkage>
                    <gmd:protocol>
                      <gco:CharacterString>OGC:WMS-1.1.1-http-get-map</gco:CharacterString>
                    </gmd:protocol>
                    <gmd:name>
                      <gco:CharacterString>test_product:test_product</gco:CharacterString>
                    </gmd:name>
                    <gmd:description>
                      <gco:CharacterString>test_product:test_product</gco:CharacterString>
                    </gmd:description>
                  </gmd:CI_OnlineResource>
    </gmd:onLine>
</csw>

I would like to substitute the content of the tag with the following:

http://server.test.it/geoserver/test_product/wms?SERVICE=WMS&version=1.1.0&request=GetMap&layers=test_product:test_product&styles=&bbox=140442.2309,3739661.3694,1330442.2309,2564661.3694&width=768&height=576&srs=EPSG:32632&format=application/openlayers&TIME=2018-06-14T10:59:00Z&amp;

I used to use the sed command in bash:

correct_url='http://server.test.it/geoserver/test_product/wms?SERVICE=WMS&amp;version=1.1.0&amp;request=GetMap&amp;layers=test_product:test_product&amp;styles=&amp;bbox=140442.2309,3739661.3694,1330442.2309,2564661.3694&amp;width=768&amp;height=576&amp;srs=EPSG:32632&amp;format=application/openlayers&amp;TIME=2018-06-14T10:59:00Z&amp;'
sed -i 's/<gmd:URL>\(.*\)<\/gmd:URL>/<gmd:URL>'"${correct_url}"'<\/gmd:URL>/' xml_file.xml

It gives me an error:

sed: -e expression #1, char 52: unknown option to `s'

Could you please tell me what I'm doing wrong?

UPDATE:

using the suggestion of @rubystallion I tried to escape all the special characters:

correct_url='http://server.test.it/geoserver/test_product/wms?SERVICE=WMS&amp;version=1.1.0&amp;request=GetMap&amp;layers=test_product:test_product&amp;styles=&amp;bbox=140442.2309,3739661.3694,1330442.2309,2564661.3694&amp;width=768&amp;height=576&amp;srs=EPSG:32632&amp;format=application/openlayers&amp;TIME=2018-06-14T10:59:00Z&amp;'
correct_url_escaped="${correct_url//\//\\\/}"
correct_url_escaped="${correct_url_escaped//&/\\&}"
correct_url_escaped="${correct_url_escaped/\?/\?}"
correct_url_escaped="${correct_url_escaped/\?/\?}"
correct_url_escaped="${correct_url_escaped//\;/\;}"
correct_url_escaped="${correct_url_escaped//\=/\=}"

sed -i 's/<gmd:URL>\(.*\)<\/gmd:URL>/<gmd:URL>'"${correct_url_escaped}"'<\/gmd:URL>/' xml_file.xml

But I'm still getting error:

sed: -e expression #1, char 47: unknown option to `s'

Am I still missing something??

spinkus
  • 7,694
  • 4
  • 38
  • 62
sylar_80
  • 251
  • 3
  • 18
  • 1
    Don't use `sed` to modify XML; instead, use an XML-aware tool. – choroba Jul 18 '18 at 06:20
  • Your XML is not valid: `xmllint` returns many `namespace error : Namespace prefix gmd on ... is not defined`. – choroba Jul 18 '18 at 06:23
  • @choroba I adde the namespaces. I forgot to write them – sylar_80 Jul 18 '18 at 06:31
  • 1
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Jul 18 '18 at 06:38
  • 1
    @Cyrus something like _xmlstarlet ed -u "//gmd:url" -v $correct_url xml_file.xml_ ? – sylar_80 Jul 18 '18 at 06:46
  • @sylar_80, you'd probably need to declare your namespace, as in `xmlstarlet ed -N gmd="http://www.isotc211.org/2005/gmd" -u '//gmd:url' -v "$correct_url"` to make it work, but something very much like that, yes. – Charles Duffy Jul 19 '18 at 00:29

2 Answers2

1

As the commenters have mentioned you can write more maintainable scripts and avoid making errors by using XML-aware tools, but let me show you why your code doesn't work:

Bash substitutes variables in strings with their contents before executing commands, so / will be parsed as a delimiter by sed and & will be parsed as the whole match in the substitution string. If you escape special characters correctly, then your command will work as intended:

correct_url='http://server.test.it/geoserver/test_product/wms?SERVICE=WMS&amp;version=1.1.0&amp;request=GetMap&amp;layers=test_product:test_product&amp;styles=&amp;bbox=140442.2309,3739661.3694,1330442.2309,2564661.3694&amp;width=768&amp;height=576&amp;srs=EPSG:32632&amp;format=application/openlayers&amp;TIME=2018-06-14T10:59:00Z&amp;'
correct_url_escaped="${correct_url//\//\\\/}"
correct_url_escaped="${correct_url_escaped//&/\\&}"

token='http://server.test.it/geoserver/test_product/wms?SERVICE=WMS&amp;TIME=2018-06-14T10:59:00Z&amp;'

sed -i 's/<gmd:URL>\(.*\)<\/gmd:URL>/<gmd:URL>'"${correct_url_escaped}"'<\/gmd:URL>/' xml_file.xml

Also, please make sure that your commands compile as described in the question next time. You forgot to put quotes around the variables.

Johannes Riecken
  • 2,301
  • 16
  • 17
  • Hi @rubystallion unfortunately I have to say that I still got the same error _sed: -e expression #1, char 47: unknown option to `s'_ PS I added the quotes. thanks! – sylar_80 Jul 18 '18 at 08:50
  • You don't have to escape the question mark, because you're inserting the URL into the replacement part of the substitution, where question marks don't have a special meaning. If you copy the code I gave verbatim into a file `script.sh` in the same directory as your XML file and then run `bash script.sh`, it should work. To escape special characters, you have to use backslashes, which is what I did in the second and third line using bash substitution. – Johannes Riecken Jul 18 '18 at 15:40
1

Your URL has special characters in it, and you are substituting the URL into the executed command. If you place an echo in front of your sed command line, you'll see what is actually executed, which clearly isn't going to be a valid sed command.

You need to escape the URL, or just not place it directly into your sed command. You can achieve the latter by using the e flag, which replaces the matched text with the result of an executed command. Like this:

url="http://x:y@www.a.com/foo?a=b&c=d" sed -r -i 's/(\s*)<gmd:URL>(.*)<\/gmd:URL>/echo "\1<gmd:URL>$url<\/gmd:URL>"/e' xml_file.xml

Note, you should be cautious about using the e flag; because you are executing something there are potential security issues.

Also please heed generally good advice about using a XML editing tool to edit XML (in one off simple jobs like this, IMO it's fine to use sed if it's the quickest way to get it done ...).

spinkus
  • 7,694
  • 4
  • 38
  • 62