sed/regex: How to match a '<' or '>' in a string

Question

I'm looking to match all less than ('<') or greater than ('>') signs in a file using sed. I only want to match the single character

My goal is to replace them with ' <' and '> ' (ensure they have white space around them so I can parse them easier) respectively.

For example, it would match: (without space within the tags)

< p >Hey this is a paragraph.< /p >< p >And here is another.< /p >

.. and turn it into (note the spaces)

 < p > Hey this is a paragraph. < /p >  < p > And here is another. < /p >

Here's what my initial (wrong) guess was:

sed 's/<{1}|>{1}/ <> /' ...

It matches the whole word/line, which is not desired, and it also does not replace correctly.

Anyways, any help would be appreciated! Thanks!

You _really_ don't want to parse HTML with regular expressions. Use an HTML parser. (see http://stackoverflow.com/a/1732454/ which is one of the most-upvoted answers on SO for good reason) — Wooble, Dec 21 '11 at 15:11
@Wooble: while I generally agree to your assertion, using regexes can still be okay for testing and the like ... if it exceeds this, use a proper parser, though. — 0xC0000022L, Dec 21 '11 at 15:27
Haha yeah, I know. I've seen that one. I'm writing a toy academic HTML formatter in perl for a very small subset of tags. I am just using sed and regex to ensure it has the whitespace that my perl code needs. — jiman, Dec 21 '11 at 15:30

score 8 · Accepted Answer · answered Dec 21 '11 at 15:14

8

Try two substitutions to make it easier:

sed 's/</ </g ; s/>/> /g' file

answered Dec 21 '11 at 15:14

sidyll

2

or you can do this - `sed -e 's/ &/g' -e 's/>/& /g' file` – jaypal singh Dec 21 '11 at 17:07

1 Answers1