I have an .xls file which I want to pretty print in order to have nice diffing rather than just binary files being changed.
My approach is to unzip
this entire thing. The resulting string does not contain linebreaks so I ran it through xmllint --format
. But on this seemingly simple path I have encountered several issues which I have already spent hours on:
unzip
multiple files inside the xml archive. This results in invalid xml. Even withunzip -q
options I get multiple DTDs and so on. xmllint breaks on this without formatting the input.unzip -c -a -q myFile.xlsx | xmllint --format -
I tried splitting the XML into an array using
read
in order to feed each individual xml file to xmllint. In the result ofread
most array items seem to be empty and the third and fourth item contain 20something letters of the xml string.IFS='\<\?xml' read -r -a files <<< "$decompressed"
I also tried just inserting linebreaks with
sed
but the filesize is so large that processing takes too long for making it feasible for diffing.${decompressed/\>\</\>\n\</g}
I have just run out of ideas so I decided to consult you guys! Thanks ahead :)