0

I have a regular expression that runs through html tags and grabs values. I currently have this to grab all values within the tag.

<title\b[^>]*>(.*\s?)</title>

It works perfectly. So if I have a bunch of pages that have titles:

<title>Index</title>

<title>Artwork</title>

<title>Theory</title>

The values returned are: Index, Artwork, Theory

How can I make this regular expression ignore all tags with the value Theory inside them?

Thanks in Advance

Ricky
  • 7
  • 2
  • 2
    See http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – BrunoLM Sep 30 '10 at 20:29
  • Any particular reason you want to do this with only a regular expression? It's really not very well suited for parsing HTML. – zigdon Sep 30 '10 at 20:36
  • Yes there is a very particular reason. I just said html to keep it simple but it's really for a program that grabs XML data and inserts it into a database. – Ricky Sep 30 '10 at 20:38
  • The question above doesn't exactly get me where I need to be, – Ricky Sep 30 '10 at 20:42
  • The person asking this question wants to exclude tags with certain attributes - while I'm looking to exclude values with certain attributes within the tags themselves. Same logic but different expressions. – Ricky Sep 30 '10 at 20:43
  • Parse the XML, you will be much better off. What language? – Aaron McIver Sep 30 '10 at 20:45

1 Answers1

1

A basic look around would probably handle that.

<title\b[^>]*>(((?!Juju - Search Results).)*)(.*\s?)<\/title>
Snekse
  • 15,474
  • 10
  • 62
  • 77
  • That's a nice little program you have there but on my end there is no execute button to test. – Ricky Sep 30 '10 at 20:55
  • I've tested the above code and it still didn't work. Let's say for instance instead of the value Theory - I want to ignore the value "Juju - Search Results". The regex can even exclude values that begin with the first 4 words without even being concerned with spaces. – Ricky Sep 30 '10 at 20:57
  • Not sure I understand what you're getting at. I've updated the example with the regEx that should handle the case you mentioned. – Snekse Sep 30 '10 at 21:26
  • I had to switch the parentheses around and remove the last set of brackets and * but works like a charm!! – Ricky Sep 30 '10 at 21:33
  • Just out of curiosity, can you post your final pattern? – Snekse Sep 30 '10 at 21:59
  • I've updated my code and the link. I was missing an escape on the forward slash in the closing title element. – Snekse Jun 18 '21 at 15:28