-1

I have two XML files and I want to

  1. find a specific XML node in File A
  2. copy it
  3. find a specific section in File B
  4. paste the copied node.

sed has been already used on my machine but I'm having troubles in finding the right regex configuration.

Example for File A:

<Containers>
  <Container id="1">    <-- to be copied start
    blubb
  </Container>    <-- to be copied end
  <Container id="2">blobb</Container>
</Containers>

Example for File B:

<Containers>
  <Container id="99">blibb</Container>
</Containers>

Example for needed output File B by cutting from <Container id="1" to </Container>:

<Containers>
  <Container id="1">    <-- copied here start
    blubb
  </Container>    <-- copied here end
  <Container id="99">blibb</Container>
</Containers>

I do know that it would be much cleaner and maybe easier to use an XML parser and other tooling but I need to use sed and I'm not a very experienced sed/regex user. I just played around a little bit with "substitute" and "delete", but that's all...

May I clarify:

  • I NEED to use sed since this is the only tool that is available on the machine.
  • I do know and I'm able to do this in other programming languages and with other tools but this is not possible here. The machine where this shall run is not under my control!

I know I shouldn't be using regex for XML/XHTML - I do know but the earth is much more complicated.

I'm running this from cygwin.

Update 1:

Due to several responses it seems to be not possible to find a solution with sed. Thanks to all that understood the problem and tried to help!

If someone still sees a potential solution then please let me know. But the challenge is in using sed. I have used XML parsers with boost, QT, C#, Java, ... but that's simply not the problem here and if I could choose... I can't.

Update 2:

Thank you all and especially Benjamin W. It is definitely possible to use sed to solve this problem but as stated many times, if you have the possibility to use a xml parsing lib and an other technology then this should be the way to go.

For me, a non technical problem (pseudo security guideline) has been solved with the available technical solution.

This was my final solution:

sed "/<Container id=\"1\">/,/<\/Container>/!d" fileA.xml | ^
sed -i "/<Containers>/r /dev/stdin" fileB.xml

Thank you.

fpdragon
  • 1,867
  • 4
  • 25
  • 36
  • Ruby, Perl, Python, Swift, all have easy xml parsers. Don't try and use a line oriented 1980's ERE regex tool to parse a block oriented grammar. Square peg => round hole. Don't use a hammer. – dawg Aug 09 '17 at 15:43
  • *I'm not a very experienced sed/regex user* This is a bad way to try and learn... – dawg Aug 09 '17 at 15:45
  • I think you're missing the point - it's worse than "more difficult" to use Regex to parse arbitrary XML, it's actually logically impossible. If you can't use an XML parser, then you can't do the project. – EJoshuaS - Stand with Ukraine Aug 09 '17 at 15:49
  • As the title says: sed is the only tooling that is available. There is no other solution than using sed or doing it manually. – fpdragon Aug 09 '17 at 15:49
  • As pointed out in the comments, sed simply can't do that. That's like saying that you want to haul a 10-ton load with a VW Beetle - it doesn't matter that that's the only car that's available, it's *still* not going to happen, so you either need to find a different car somehow or accept the fact that the load isn't going anywhere. – EJoshuaS - Stand with Ukraine Aug 09 '17 at 15:54
  • I was thinking of something like cutting from FileA to a temp file and afterwards inserting this in FileB. Or maybe even more complicated. – fpdragon Aug 09 '17 at 15:57
  • Why can't you use a different tool? There are plenty of libraries that you can deploy with your solution if you're not able to install stuff on the remote computer. – EJoshuaS - Stand with Ukraine Aug 09 '17 at 15:57
  • If you only have sed available -- probably no go. If you have awk, bash available -- perhaps. – dawg Aug 09 '17 at 15:58
  • The machine is not under my control due to security reasons I'm not allowed to. I know, even for me it's hard to understand but this is not my decission ;-) – fpdragon Aug 09 '17 at 16:00
  • You can deploy libraries with your software, though, right? Why not just use a library you can deploy with your code instead of an external tool? – EJoshuaS - Stand with Ukraine Aug 09 '17 at 16:02
  • it's not a topic about what I can. The problem is what I am allowed to do. – fpdragon Aug 09 '17 at 16:04
  • `cygwin` usually comes with `bash`, `awk`, `perl`...has it all been removed? What's the security rationale? Can't you just grab the awk binary? cmder? git-for-windows? – simlev Aug 09 '17 at 16:10
  • 2
    *The problem is what I am allowed to do.* Then by logical extension: if you do not have access to any tool beside `sed` the administrator of the computer does not want you to edit xml files. If you do have awk or Bash, it is possible, but not robust. – dawg Aug 09 '17 at 16:11
  • 5
    It's also possible with sed, just super brittle. I have a one-liner that does exactly what the question asks, but can break in many, many ways. Really probably only solves exactly the minimal example. And would fail for anything that's different just the slightest bit. PS. I'm not saying "parsing XML is possible with sed". I'm saying "one-off, error prone hacks are possible". – Benjamin W. Aug 09 '17 at 16:12
  • omg... thanks for the responses but still... I can't give you good answers why I'm in this situation. I just can say that I'm also not happy with this and I have asked the same "security questions" like you do. I had to accept this so maybe no solution is the best I can get xD I do understand that mainly this is no technical problem. But I was asked to find a technical solution for a non technical problem. – fpdragon Aug 09 '17 at 16:17
  • @Benjamin W: I'd love to see what you have and I agree what you are saying. – fpdragon Aug 09 '17 at 16:19

1 Answers1

5

Here is a sed command that does what the example asks for. Let me present it first, then list how it will break:

sed '/<Container id="1">/,/<\/Container>/!d' fileA.xml |
    sed '/<Containers>/r /dev/stdin' fileB.xml

resulting in

<Containers>
  <Container id="1">
    blubb
  </Container>
  <Container id="99">blibb</Container>
</Containers>

This requires GNU sed to read standard input from the special file /dev/stdin; without GNU sed, the output of the first command can be saved into a temp file and then read from there.

The first command looks for a line range starting with a line matching <Container id="1"> and ending with a line matching <\/Container>. Everything outside of that range is deleted.

The second command looks for a line matching <Containers> and then inserts the output of the first command with r.

Here is how this can break:

  • Any changes in whitespace (<Container id="1"> and it breaks)
  • Any differences in linebreaks
    • Closing tag on the same line as opening tag: breaks
    • <Containers> not on a line on its own: breaks
    • Next node starts on same line as closing tag </Container>: breaks
  • Any <Container> child node elsewhere with ID 1
  • Any other <Containers> node in fileB.xml
  • Any nesting with the same node names

...and many more.

As pointed out in comments, this should really be a very last resort. You'd be better off copying your input files to a machine where you have the proper tools and copying them back afterwards than using this, probably.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116