I've been reading about deleting duplicate lines all over stack. There's perl, awk, and sed solutions, however none as specific as I want and I'm at a loss.
I want to delete the duplicate <path>
tags from this XML case INSENSITIVELY with a quick bash/shell perl command. Leave all other duplicate lines (like <start>
and <end>
) intact!
Input XML:
<package>
<id>1523456789</id>
<models>
<model type="A">
<start>2016-04-20</start> <------ Duplicate line to keep
<end>2017-04-20</end> <------ Duplicate line to keep
</model>
<model type="B">
<start>2016-04-20</start> <------ Duplicate line to keep
<end>2017-04-20</end> <------ Duplicate line to keep
</model>
</models>
<userinterface>
<upath>/Example/Dir/Here</upath>
<upath>/Example/Dir/Here2</upath>
<upath>/example/dir/here</upath> <------ Duplicate line to REMOVE
</userinterface>
</package>
So far I've been able to grab the duplicate lines, but don't know how to remove them. The following
grep -H path *.[Xx][Mm][Ll] | sort | uniq -id
Gives the result:
test.xml: <upath>/example/dir/here</upath>
How do I remove that line now?
Doing the perl version or awk version below erases the <start>
and <end>
dates as well.
perl -i.bak -ne 'print unless $seen{lc($_)}++' test.xml
awk '!a[tolower($0)]++' test.xml > test.xml.new