's:</i><i>::g'
That captures the basic idea that we want to get rid of an endtag followed by a new start tag, but fails when encountering;
(1) capitalised tags:
Replace 'i' with '[Ii]' to match capitalized html tags; if you want the capitalization to remain as it was, rather than replacing with a lowercase i, put the match within a (group)
and have \1
in the output side of the sed command.
(2) whitespace between the tags; to replace any number of spaces with a single space, we use an optional match group around the first space, which is put into the output:
's:</i>\(\ \)\?\s*<i>:\1:g'
The space and forward slash characters are escaped with a backslash, and the g at the end of each replacement allows it to match multiple times on each line.
(3) whitespace inside the tags should be matched with \s
which captures both tabs and spaces. Oddly enough whitespace is allowed before the final >
but not elsewhere in the tag. However, if a tag spans multiple lines you are screwed. Matching multiple lines is possible in sed but turns this into a script that is much too long for a single line.
After modifying for all three cases, the script line becomes:
sed -i 's:</[Ii]\s*>\(\ \)\?\s*<[Ii]\s*>:\1:g' yourfile.html
A note about the -i (in place replacement); this is an option in GNU sed, not standard sed. OSX has -i, but needs an extra ''
parameter after the -i. If your sed does not support -i, you need to redirect to a new file, then mv that file to replace the original:
sed 'the same command' > newfile.html
mv newfile.html yourfile.html
See the SO question: 'sed edit file in place' for more information on that.