Creating bash script to parse xml file to csv

Question

I'm trying to create a bash script to parse an xml file and save it to a csv file.

For example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <List>
    <Job id="1" name="John/>
    <Job id="2" name="Zack"/>
    <Job id="3" name="Bob"/>
</List>

I would like the script to save information into a csv file as such:

John | 1
Zack | 2
Bob  | 3

The name and id will be in a different cell.

Is there any way I can do this?

Might have just edited the old question (http://stackoverflow.com/q/21495533/3076724) rather than posting a new one, but you should definitely at least link to it when posting similar questions. — Reinstate Monica Please, Feb 02 '14 at 06:43
Duplicate: https://stackoverflow.com/questions/14368347/convert-xml-file-to-csv-in-shell-script — Vanuan, Oct 26 '17 at 00:26

score 5 · Answer 1 · edited May 23 '17 at 11:56

5

You've posted a query similar to your pervious one. I'd again suggest using a XML parser. You could say:

xmlstarlet sel -t -m //List/Job -v @name -o "|" -v @id -n file.xml

It would return

John|1
Zack|2
Bob|3

for your sample data.

Pipe the output to sed: sed "s/|/\t| /" if you want it to appear as in your example.

edited May 23 '17 at 11:56

Community

1
1

answered Feb 02 '14 at 06:57

devnull

118,548
33
236
227

score 2 · Answer 2 · answered Feb 02 '14 at 06:40

2

Try something like this

#!/bin/bash
while read -r line; do
  [[ $line =~ "name=\""(.*)"\"" ]] && name="${BASH_REMATCH[1]}" && [[ $line =~ "Job id=\""([^\"]+) ]] &&  echo "$name | ${BASH_REMATCH[1]}"
done < file

The line with John is malformed. With it fixed, example output

John | 1
Zack | 2
Bob | 3

answered Feb 02 '14 at 06:40

Reinstate Monica Please

11,123
3
27
48

1

in this instance `name="John/>`, there is no double quota after John, so recommend to replace `[[ $line =~ "name=\""(.*)"\"" ]]` to `[[ $line =~ "name=\""([^\"|/]*) ]]` – BMW Feb 03 '14 at 05:24
2

@BMW Thanks. I assumed it shouldn't be malformed xml, but if it is could do that or something like `([A-Za-z]*)` – Reinstate Monica Please Feb 03 '14 at 05:33
dude, can u elaborate on that short script? I am quite confused. :) nevertheless its looking crazy good. – Dominik May 02 '16 at 11:47

Vanuan · Answer 3 · 2019-08-30T08:56:46.823

2

Extending xmlstarlet approach:

Given this xml file as input:

<DATA>
  <RECORD>
    <NAME>John</NAME>
    <SURNAME>Smith</SURNAME>
    <CONTACTS>
      "Smith" LTD,
      London, Mtg Str, 12,
      UK
    </CONTACTS>
  </RECORD>
</DATA>

And this script:

xmlstarlet sel -e utf-8 -t \
  -o "NAME, SURNAME, CONTACTS" -n \
  -m //DATA/RECORD \
  -o "\"" \
  -v $"str:replace(normalize-space(NAME), '\"', '\"\"')" -o "\",\"" \
  -v $"str:replace(normalize-space(SURNAME),      '\"', '\"\"')" -o "\",\"" \
  -v $"str:replace(normalize-space(CONTACTS), '\"', '\"\"')" -o "\",\"" \
  -o "\"" \
  -n file.xml

You'll have the following output:

NAME, SURNAME, CONTACTS
"John", "Smith", """Smith"" LTD, London, Mtg Str, 12, UK"

edited Aug 30 '19 at 08:56

answered Oct 26 '17 at 00:24

Vanuan

31,770
10
98
102

This is a good solution, and elegant. Just I got: compilation error: element with-param XSLT-with-param: Failed to compile select expression 'str:replace' because of unclosed parenthesis in normalize-space call; should read "str:replace(normalize-space(NAME) , '\"', '\"\"')" – Diego1974 Aug 29 '19 at 09:21
Thanks for this. Anyone else extracting URLs from XML may find the `&` isn't escaped. Fix this by adding `-T` after the `sel` command, e.g. `xmlstarlet sel -T -e utf-8......` (see https://stackoverflow.com/questions/46255304/unescape-the-ampersand-via-xmlstarlet-bugging-amp) – Neek Mar 11 '22 at 06:29

BMW · Answer 4 · 2014-02-03T05:29:48.453

1

Using sed

sed -nr 's/.*id=\"([0-9]*)\"[^\"]*\"(\w*).*/\2 | \1/p' file

Additional, base on BroSlow's cript, I merge the options.

#!/bin/bash

while read -r line; do
  [[ $line =~ id=\"([0-9]+).*name=\"([^\"|/]*) ]] && echo "${BASH_REMATCH[2]} | ${BASH_REMATCH[1]}"
done < file

edited Feb 03 '14 at 05:29

answered Feb 02 '14 at 10:30

BMW

42,880
12
99
116

Creating bash script to parse xml file to csv

4 Answers4