0

I have xml file with lots of data in it. But some of the tags has been on another line instead of on same line. I need to do this using shell script

Input

<lineid>Product 
testing machine 
</lineid>

Expected Output

<lineid>Product testing machine </lineid>

In input I have given the extra line as input is also showing as same as output .

The input data is not in single line and i want it in single line , also want to do the changes in same file.

Fravadona
  • 13,917
  • 1
  • 23
  • 35
Amey K
  • 1
  • 1

2 Answers2

1

This should put everything into one line and remove extra spaces. It expects a filename as argument. So, if you save this script as formatter.sh and input file as input.txt you would call it as:

./formatter.sh input.txt

The output gets saved to the same file, so make sure to try it on a copy!

#!/bin/bash

input_file="$1"  # Replace with the path to your input file

if [ -f "$input_file" ]; then
    input=$(cat "$input_file")
    formatted=$(echo "$input" | tr -d '\n' | sed -e 's/ *$//' -e 's/  */ /g')
    echo "$formatted" > "$input_file"
else
    echo "Input file not found: $input_file"
fi
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
hermanoff
  • 11
  • 4
1

As I understand your request, tags of simple XML can be condensed with something like this:

#!/bin/bash

if [ $# -lt 1 ]; then echo "no file provided"; exit 1; fi
xml_input="$1"
if [ ! -r ${xml_input} ]; then echo "file not readable"; exit 1; fi
xml_temp="$(mktemp /tmp/${xml_input}.XXXXXXXXX)" || exit 1

tr '\n' ' ' < "${xml_input}" > "${xml_temp}"
sed -i 's/\r/ /g' "${xml_temp}"
sed -i 's/  */ /g' "${xml_temp}"
sed -i 's/?> /?>/g' "${xml_temp}"
sed -i 's/?>/?>\n/g' "${xml_temp}"
sed -i 's/> </>\n</g' "${xml_temp}"
mv "${xml_temp}" "${xml_input}"

which will convert:

<?xml version="1.0" encoding="UTF-8"?><root>

<lineid>
     Product  
     testing machine  
     
     </lineid>
                    <lineid>Product testing machine

                    </lineid>
    </root> 
    

to:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<lineid> Product testing machine </lineid>
<lineid>Product testing machine </lineid>
</root>

but a proper shell script to do that for all XML cases would be huge, or just a caller for an actual parser written in another language. There are a lot of good explanations:

https://stackoverflow.com/a/8577108/1919793

Can you provide some examples of why it is hard to parse XML and HTML with a regex?

Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

and many text editors will do this a lot better for you:

How do I format XML in Notepad++?

theSparky
  • 440
  • 3
  • 13