1

I am trying to find some specific information in an XML tag and convert it to a json string. I have come up with the most convoluted solution, but it almost works. I just need to remove the whitespace and line breaks. I have tried however that results in even my values to run together.

Sample data:

<config>
  <derivedFrom>
    <courseName>Family and Medical Leave</courseName>
    <courseCode>FML</courseCode>
    <courseAuthor>Company 1</courseAuthor>
    <courseVersion>2.0.0</courseVersion>
    <importLocale>en-US</importLocale>
  </derivedFrom>
</config>

This is the sed code I am using:

sed -n '
    /<derivedFrom>/ {
    :a;
    N;
    /<\/derivedFrom>/!ba;
    s/.*<derivedFrom>//;
    s/<\/derivedFrom>//;
    s/<\/[a-zA-Z]*>/",/g;
    s/</"/g;
    s/>/":"/g;
    s/[[:space:]]//g;
    s/,$//g;
    p
    }'

And finally, here is my current output is "courseName":"FamilyandMedicalLeave","courseCode":"UBM2C","courseAuthor":"Alchemy","courseVersion":"2.0.021","importLocale":"en-US"

I know I need to replace [[:space:]] with something else as I don't want text in my quotes to run together, but I am stuck. For example: Family and Medical Leave should keep its spaces. There is probably also an easier way to do this with some XML to JSON script. However, I need to do this without needing to install anything else onto the servers.

Shawn Dibble
  • 177
  • 2
  • 15
  • Why re-invent the wheel when you can use the ones existing already https://github.com/hay/xml2json – Inian Jul 18 '18 at 14:57

2 Answers2

4

Note: I don't know all the details about xml and json. As you specify you cannot install a program, here's some steps using sed and paste that might help you. This is intended as guide and may not be full answer you are expecting, and assumes data format as shown in sample

Step 1: getting required lines (See How to select lines between two patterns? for details)

$ sed -n '/<derivedFrom>/, /<\/derivedFrom>/{//!p}' ip.txt
    <courseName>Family and Medical Leave</courseName>
    <courseCode>FML</courseCode>
    <courseAuthor>Company 1</courseAuthor>
    <courseVersion>2.0.0</courseVersion>
    <importLocale>en-US</importLocale>

Step 2: Re-format the filtered lines
Can also combine with previous step as //!s|.*<\([^>]*\)>\(.*\)</\1>.*|"\1":"\2"|p

sed 's|.*<\([^>]*\)>\(.*\)</\1>.*|"\1":"\2"|'
"courseName":"Family and Medical Leave"
"courseCode":"FML"
"courseAuthor":"Company 1"
"courseVersion":"2.0.0"
"importLocale":"en-US"

Step 3: join them using paste

paste -sd,
"courseName":"Family and Medical Leave","courseCode":"FML","courseAuthor":"Company 1","courseVersion":"2.0.0","importLocale":"en-US"
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    Thanks. This is much cleaner and nicer than what I was using. I, unfortunately, couldn't get `paste` to work, but I was able to use another sed to remove the line breaks and replace them with commas. – Shawn Dibble Jul 18 '18 at 16:22
0

Why not simply ? Valid for Bash

sed -n '
    /<derivedFrom>/ {
    :a;
    N;
    /<\/derivedFrom>/!ba;
    s/.*<derivedFrom>//;
    s/<\/derivedFrom>//;
    s/<\/[a-zA-Z]*>/",/g;
    s/</"/g;
    s/>/":"/g;
    s/,$//g;
    p
    }' input.txt | sed 's/^ *//g;s/ *$//g'

Regards!

Matias Barrios
  • 4,674
  • 3
  • 22
  • 49