0

I am having trouble with the Linux command sed

I have a html document that I want to extract the body text and then copy that into a new file, and I am gonna do it with sed:

sed '^<body>(.*)<\body>$/p' source.html > bodyextracted

but it didn't seem to work

Amber
  • 73
  • 1
  • 7
  • The canonical source on using regex to parse HTML: http://stackoverflow.com/a/1732454/4070984 – Ben Feb 15 '16 at 23:06
  • 1
    If you accept answers to your previous questions (http://stackoverflow.com/users/5919843/amber) or explain why they don't solve your problem you might find more people willing to try to help you with future questions. – Ed Morton Feb 16 '16 at 13:36

1 Answers1

1
sed -n '/<body>/,/<\/body>/p' source.html > bodyextracted

This will output since the first occurence of <body> to the next occurence of </body>.

Joao Morais
  • 1,885
  • 13
  • 20