I have a large externally generated xml file that has some invalid characters, a backslash in my case. I know what to replace these fields with, so I can gedit a single file and fix it manually. However there are many of these files, all with the same problem. I would like to write a bash script to fix them all.
Problem The problematic section looks like this.
<root>
<array>
<dimension> dim="1">gridpoints</dimension>
<field> a </field>
<field> b </field>
<field> c </field>
<field> \00\00\00 </field>
<field> \00\00\00 </field>
<field> \00\00\00 </field>
<set>
All the data
</set>
</array>
</root>
Desired output
<root>
<array>
<dimension> dim="1">gridpoints</dimension>
<dimension> dim="2">morepoints</dimension>
<dimension> dim="3">evenmorepoints</dimension>
<field> a </field>
<field> b </field>
<field> c </field>
<field> d </field>
<field> e </field>
<field> f </field>
<set>
All the data
</set>
</array>
</root>
Fix so far I have already found a way to remove the offending backslashes using perl, but then I can't figure out how to edit the fields individually as the below code gets the desired solution, but with each field having entry "a"
#!/bin/bash
perl -CSDA -pe'
s/[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+//g;
' file.xml > temp.xml
xmlstarlet ed -u "/root/array/field" -v "a" temp.xml > file_fixed.xml
I will also gladly take any advice on how to do this more efficiently. Thank you.
Edit As requested by zdim, I have added an example that is more representative of the full file I am dealing with.
<root>
<path1>
<array>
<dimension> dim="1">gridpoints</dimension>
<field> a </field>
<field> b </field>
<field> c </field>
<field> \00\00\00 </field>
<field> \00\00\00 </field>
<field> \00\00\00 </field>
<set>
All the data
</set>
</array>
</path1>
<path2>
<array>
<dimension> dim="1">gridpoints</dimension>
<field> Behaves Correctly </field>
</array>
</path2>
</root>
It should be noted that I receive these files as output from another program and then need to fix them before feeding them into the next. I am no where near experienced with xml, which is why I may have missed some obvious solutions.