Using sed or awk to select

Question

I'm trying to select the lines between between two markers in an html file. I've tried using sed and awk but I think there's an issue with the way i'm escaping some of the characters. I have seen some similar questions and answers, but the examples given are simple, with no special characters. I think my escaping is the issue. I need the lines between

<div class="bread crumb">

and

</div>

There is no other div within the block and there are multiple lines within the block.

Do I need to escape the characters <, > and ? as below?

sed -n -e '/^\<div class=\"bread crumb\"\>$/,/^\<\/div\>$/{ /^\<div class=\"bread crumb\">$/d; /^\<\/div>$/d; p; }'

My awk attempt :

awk '/\<div class=\"bread crumb\"\>/{flag=1;next}/\<\/div\>/{flag=0}flag'

While `sed` and `awk` might be able to do the job for some input, it is considered bad practice to use non-HTML aware tools to parse HTML. You should have a look at `xpath`, which is a dedicated tool to parse HTML/XML files — Aserre, Jul 21 '17 at 11:45
obligatory [don't parse html with regex](https://stackoverflow.com/a/1732454/7552) link — glenn jackman, Jul 21 '17 at 14:27

SLePort · Answer 1 · 2017-07-21T11:58:03.133

1

You should use a html parser for that job.

If you still want to do it with sed, don't escape < and > that are used for word boundary.

Try this:

sed -ne '/<div class="bread crumb">/,/<\/div>/{//!p;}' file

The //!p part outputs all the block except the lines matching the address patterns.

edited Jul 21 '17 at 11:58

answered Jul 21 '17 at 11:45

SLePort

15,211
3
34
44

score 1 · Answer 2 · answered Jul 21 '17 at 11:46

1

Actually, you just need to escape the / in the </div>, rest goes fine..

sed -n '/<div class="bread crumb">/,/<\/div>/{//!p}'

answered Jul 21 '17 at 11:46

Guru

16,456
2
33
46

score 0 · Answer 3 · answered Jul 21 '17 at 17:35

0

Just use string matches in awk:

awk '$0=="</div>"{f=0} f{print} $0=="<div class=\"bread crumb\">"{f=1} ' file

answered Jul 21 '17 at 17:35

Ed Morton

188,023
17
78
185

Using sed or awk to select

3 Answers3