1

I am trying to return text between header tags <h1> and </h1> and also between <h3> and </h3> tags from a text file . The file has only one <h1> and has 4 <h3> tags, I'm really only interested in the first 2 <h3> tags. But I want the text between them.

These grep commands produce the correct output one at a time, but when I try and combine them I have problems.

grep  -o -P '(?<=<h1>).*(?=</h1>)' file.txt  
grep  -o -P '(?<=<h3>).*(?=</h3>)' file.txt  

I tried
grep -e '(?<=<h1>).*(?=</h1>)'-e'(?<=<h3>).*(?=</h3>)'

grep -o '(?<=<h1>).*(?=</h1>)'\'(?<=<h3>).*(?=<h3>)'

I'm not sure what the -P does other than the man page says Perl expression. But it only allows one -P at a time. Is there another command that I could use to pull text between

cadteach
  • 11
  • 1
  • 3
    suggestions: 1) please click [edit] to add a sample input text and expected output for that, it will help to add clarity and in testing before adding answers 2) using regex to parse html/xml is not a good idea, use tools like xmlstarlet instead – Sundeep Jul 01 '18 at 06:04
  • 3
    [Don't Parse HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) – Cyrus Jul 01 '18 at 06:17
  • Possible duplicate of [Regular Expressions: Is there an AND operator?](https://stackoverflow.com/questions/469913/regular-expressions-is-there-an-and-operator) – Wiktor Stribiżew Aug 29 '18 at 12:44

0 Answers0