2

I've currently got the following xml but have trouble processing as date and time contained within a single element.

<data>
    <StartDateTime>2019-10-19T12:00:00Z</StartDateTime>
</data>

but want it output as:

<data>
   <date>2019-10-19</date>
   <time>12:00:00Z</time>
</data>

Is it possible using sed to alter this?

Samosa
  • 21
  • 3
  • 3
    [You can't parse \[X\]HTML with regex](http://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, e.g.). – Cyrus Oct 17 '19 at 17:11

1 Answers1

1

@Cyrus is right when he says that [X]HTML cannot be parsed with a regex.

But if you are sure the input will always look like this, and since the input isn't as complex, you can, in fact, do it with sed:

sed -E 's|<StartDateTime>([0-9]{4}-[0-9]{2}-[0-9]{2})T([0-9]{2}:[0-9]{2}:[0-9]{2}[A-Z])</StartDateTime>|<date>\1</date>\n    <time>\2</time>|g'

This expression uses capturing groups which you can later reference in the substitution with '\' followed by the index of the group, \1 and \2 in this case.

ProXicT
  • 1,903
  • 3
  • 22
  • 46