20

I'm using sublime text 2 editor. I would like to use regex to match all character between all h1 tags.

As of now i'm using like this

<h1>.+</h1>

Its working fine if the h1 tag doesn't have breaks.

I mean for

<h1>Hello this is a hedaer</h1>

its working fine.

But its not working if the tag look like this

<h1>
   Hello this is a hedaer
</h1>

Can someone help me with the syntax?

LuFFy
  • 8,799
  • 10
  • 41
  • 59
PrivateUser
  • 4,474
  • 12
  • 61
  • 94

2 Answers2

48

By default . matches every character except new line character.

In this case, you will need DOTALL option, which will make . matches any character, including new line character. DOTALL option can be specified inline as (?s). For example:

(?s)<h1>.+</h1>

However, you will see that it will not work, since the default behavior of the quantifier is greedy (in this case its +), which means that it will try to consume as many characters as possible. You will need to make it lazy (consume as few characters as possible) by adding extra ? after the quantifier +?:

(?s)<h1>.+?</h1>

Alternatively, the regex can be <h1>[^<>]*</h1>. In this case, you don't need to specify any option.

Panos Kalatzantonakis
  • 12,525
  • 8
  • 64
  • 85
Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • With OP's regex, specifying those options are not sufficient. – nhahtdh Jan 25 '13 at 15:57
  • 1
    @Some1.Kill.The.DJ I have tried your code. But its still not matching when the tag contain break – PrivateUser Jan 25 '13 at 16:03
  • Wouldn't that third regex break if you have any nested tags in h1? Like span or link or whatever... I just tried the "(?s)" and it works in sublime, that's cool. – enrey Jan 26 '13 at 00:04
  • I never knew you could specify flags in regex searches in sublime - thanks for the information @Some1.Kill.The.DJ – Jay Jan 26 '13 at 05:57
  • @enrey yes it would break..but even the 1st and 2nd regex can break if there is another h1 tag in h1 itself – Anirudha Jan 26 '13 at 05:59
  • 1
    Not sure how you guys figure this stuff out. Where is the documentation on regex search like this? – Arete Mar 06 '17 at 15:06
  • Works great, slight improvement to get all h tags: /[^<>]*<\/h[1-6]>/g (JS regex expression) BTW you can use jQuery $(el).text() to get just the text content. – David D. Oct 14 '17 at 12:12
26

Since this question is the top Google results search for a regex trying to find all the characters between an h1 tag I thought I would give that answer as well. Since that was what I was looking for.

(?s)(?<=<h1>)(.+?)(?=</h1>)

That regex, if used on a sample text like <h1>A title</h1> <p>Some content</p> <h1>Another title</h1> will only return A title.

aychedee
  • 24,871
  • 8
  • 79
  • 83