sed -n 's;</\?td>;;gp' scoretable.html | \
sed -e 's;<td class="center">;;' \
-e 's;<.*>;;'
Note that I use ;
instead of /
as my delimiter - I find it a bit easier to read. Sed will use whatever character you put after 's
as the delimiter.
Okay, now the explanation. The first line:
-n
will repress output, but the p
at the end of the command tells sed to specifically print all lines matching the pattern. This will get us only the lines wrapped in <td>
tags. At the same time, I'm finding anything that matches </\?td>
and substituting it with nothing. /\?
means /
must not appear or appear only once, so this will match both the opening and closing tags. The g
at the end, or global, means that it won't stop trying to match the pattern after it succeeds for the first time in a line. Without g
it would only substitute the opening tag.
The output from this is piped into sed again on the second line:
-e
just specifies that there is an editing command to run. If you're just running one command it's implied, but here I run two (the next one is on the third line).
This removes <td class="center">
, and the next line removes any other tags (in this case the <br>
tags.
The last command can only be run if you're sure that there's only at most one tag on a line. Otherwise, the .*
will be greedy and match too much, so in:
<td class="center">24 </ br>
it would match the entire line, and remove everything.