3

I need to match a set of strings in Java. This string can contain self ending HTML, one or more white spaces and one or more  s.

For example:

String html = "<p>Stack Overflow is a great site. I really like Stack<br/>Overflow. Stack&nbsp;&nbsp;Overflow has helped me a lot to learn different things. I frequently visit Stack<br></br>Overflow. Stack<div id=\"XX\" />Overflow is really nice.<p><br/><p>Stack and overflow are two different thing.</p>".

Now I need a regular expression which would match the following strings in the above string.

 1. Stack Overflow 
 2. Stack<br/>Overflow
 3. Stack&nbsp;&nbsp;Overflow
 4. Stack<br></br>Overflow
 5. Stack<div id=\"XX\" />Overflow

But it shouldn't match

  • Stack and overflow
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user1061692
  • 53
  • 1
  • 6
  • 1
    This sounds dangerously close to [using a regex to parse HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Are you sure you don't need some sort of HTML parser? – smessing Mar 06 '12 at 18:09
  • See I could have used any standard html parser like jSoup but as per requirement I have to use regex only. – user1061692 Mar 06 '12 at 18:11
  • Is this homework? Why must you use regex only? – neontapir Mar 06 '12 at 22:23

2 Answers2

2
stack(<.*?>|&nbsp;|\s)*overflow
publicRavi
  • 2,657
  • 8
  • 28
  • 34
1

If I understand your question correctly, you are looking to match "stack" followed by "overflow" with allowing some optional text between them. If this is what you want, how about this:

(?i)stack.*?overflow

This will not behave very well if your input string contains "stack" but no corresponding "overflow".

You can learn more about java's regular expression syntax @ http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Gowtham
  • 1,465
  • 1
  • 15
  • 26
  • yeah you are correct but the things is that I will only allow 1. valid htmL 2. one or more spaces. 3. one or more " " But won't allow other things. – user1061692 Mar 06 '12 at 18:07
  • I have already confessed that I am a beginner in regex. I don't have any problem with Java. I need only the regex or the hint that how to build this complex regex. – user1061692 Mar 06 '12 at 18:12
  • @user1061692: Gowtham gave you the hint that you need, which is to learn more Java's regular expression syntax by reading through http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html. – ruakh Mar 06 '12 at 18:18