0

I need to substitute owl:Class in the following text with the LP number

Input

<owl:Class rdf:about="https://loinc.org/LP173100-1">
        <rdfs:subClassOf rdf:resource="https://loinc.org/LP410935-3"/>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Counseling (LP)</rdfs:label>
        <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Counseling</skos:prefLabel>
        <loinc:hasCode rdf:datatype="http://www.w3.org/2001/XMLSchema#string">LP173100-1</loinc:hasCode>
    </owl:Class>

so that the substituted output looks like the following

Output

<LP173100-1 rdf:about="https://loinc.org/LP173100-1">
        <rdfs:subClassOf rdf:resource="https://loinc.org/LP410935-3"/>
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Counseling (LP)</rdfs:label>
        <skos:prefLabel rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Counseling</skos:prefLabel>
        <loinc:hasCode rdf:datatype="http://www.w3.org/2001/XMLSchema#string">LP173100-1</loinc:hasCode>
    </LP173100-1>

I have used s/\(owl:Class\)\(.*org\/\)\(LP.*\)"/\3\2\3/g for the first line but I don't know how to apply it on the last line. Is there a more elegant solution? I have a huge file where I need to do such substitution.

I am using a Centos OS 7.7 Linux machine.

user2979872
  • 413
  • 1
  • 9
  • 20
  • 1
    Are you trying to replace a tag name in an xml node with a string like `"LP173100-1"`? If so, you should know xml tag names can't have `"` around them. Maybe you mean replace with `LP173100-1`? – Jack Fleeting Jul 14 '20 at 14:31
  • @JackFleeting Thanks for pointing it out. I made the change – user2979872 Jul 14 '20 at 14:34
  • 3
    Do you really want to use `sed` for this? Using an XML tool will probably be both easier and more robust. – tripleee Jul 14 '20 at 14:39
  • @tripleee I am open to suggestions. I just happen to be new to XML and sed – user2979872 Jul 14 '20 at 14:41
  • 1
    Probably see also https://meta.stackoverflow.com/questions/261561/please-stop-linking-to-the-zalgo-anti-cthulhu-regex-rant which mainly talks about HTML, but of course the same broad reasoning applies to any XML application or indeed any structured format. – tripleee Jul 14 '20 at 14:43
  • 2
    In that case, Lesson 1: `sed` is not appropriate for XML. – chepner Jul 14 '20 at 15:16
  • 1
    Absolutely agree w/ @chepner: xml and regex are like oil and water. Use something like xidel or xmlstarlet. – Jack Fleeting Jul 14 '20 at 16:01

1 Answers1

0

You can replace the \n symbol with a different one (such that is not located anywhere else inside the file) and then continue working as usual.

cat foo.txt | tr '\n' '\r' | sed -e 's/\(owl:Class\)\(.*org\/\)\(LP.*\)"/"\3"\2\3/g'  | tr '\r' '\n'
kysna
  • 59
  • 6
  • Some `sed` variants will have trouble with very long lines. And of course, lose the ugly [useless `cat`.](https://stackoverflow.com/questions/11710552/useless-use-of-cat) – tripleee Jul 14 '20 at 14:38
  • The unbounded `.*` will consume as much of the string as it can. Downvoting as clearly untested. – tripleee Jul 14 '20 at 14:40
  • Question was about multiple line sed substitution and the expression is taken from the original post. – kysna Jul 14 '20 at 14:43
  • But then that regex will no longer work correctly when everything is a single line. Changing it to `[^\r]*` would work if yowr `sed` supports the symbolic notation `\r`; but many do not. You could switch to Perl instead of `sed` for portability, but then Perl offers many far superior ways to solve the problem without regex. – tripleee Jul 14 '20 at 15:08