how can I use sed to separate the body and the other part of a html document?

Question

I am having trouble with the Linux command sed

I have a html document that I want to extract the body text and then copy that into a new file, and I am gonna do it with sed:

sed '^<body>(.*)<\body>$/p' source.html > bodyextracted

but it didn't seem to work

The canonical source on using regex to parse HTML: http://stackoverflow.com/a/1732454/4070984 — Ben, Feb 15 '16 at 23:06
If you accept answers to your previous questions (http://stackoverflow.com/users/5919843/amber) or explain why they don't solve your problem you might find more people willing to try to help you with future questions. — Ed Morton, Feb 16 '16 at 13:36

score 1 · Answer 1 · answered Feb 15 '16 at 23:02

1

sed -n '/<body>/,/<\/body>/p' source.html > bodyextracted

This will output since the first occurence of <body> to the next occurence of </body>.

answered Feb 15 '16 at 23:02

Joao Morais

1 Answers1