-4

I want a regular expression that can match <FONT FACE=\"Verdana\" SIZE=\"12\"> My Name is xyz </FONT> or

<LI><FONT FACE=\"Verdana\" SIZE=\"12\"> My Name is xyz </FONT></LI> and it should not be greedy

VLAZ
  • 26,331
  • 9
  • 49
  • 67
java_geek
  • 17,585
  • 30
  • 91
  • 113
  • Which part are you looking for and which parts can vary? Are you looking for *exactly* that string, or are you looking for "My Name is" whatever the font face is, or what? – Paul Tomblin Aug 16 '10 at 17:56
  • possible duplicate of [Java Regular expression](http://stackoverflow.com/questions/3474755/java-regular-expression) – Paul Tomblin Aug 16 '10 at 17:58
  • I am looking for a regex which will match either of the 2 strings completely above. – java_geek Aug 16 '10 at 18:01
  • @java_geek: `.*` will match those 2 strings completely. And many other strings as well. If you don't want to match the other strings, then say what those strings are. It always helps to be specific in saying what you want to match and what you don't want to match in regex questions. – polygenelubricants Aug 17 '10 at 06:15
  • ok....assume that there is a huge file and u want to match either of the above lines using ur regular expression. Thats what i want exactly – java_geek Aug 17 '10 at 07:20
  • you realize that HTML isn't a "regular" language, making regular expressions inappropriate for processing them. use one of the dozen solutions available for Java for processing HTML. or better yet look at Beautiful Soup. –  Aug 17 '10 at 14:29

4 Answers4

6

Three questions in as many hours. Must be a record of some sorts.

For the sake of humanity, don't use regular expressions to parse XML!

Community
  • 1
  • 1
Manoj Govindan
  • 72,339
  • 21
  • 134
  • 141
2

You should not be using regular expressions for this

Woot4Moo
  • 23,987
  • 16
  • 94
  • 151
1

Why not use a Java XML parser?

treeface
  • 13,270
  • 4
  • 51
  • 57
  • What...Google is of no use? Perhaps the OP should've used Bing? – treeface Aug 16 '10 at 18:11
  • if you have a problem and solve it with a regular expression, now you have 2 problems. If you don't think HTML and XML parsers are of any use then you will never understand the reasons no body will answer your question with what you want ... because they can't. –  Aug 17 '10 at 14:33
  • @fuzzy Who are you talking to? – treeface Aug 17 '10 at 21:36
0

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts

the beginning of the answer to this exact question with over 4300 up votes. If you don't understand this and insist on ignoring this advice then you probably should not be programming.

What happens when FACE and SIZE are in opposite order?

Community
  • 1
  • 1
  • See http://stackoverflow.com/a/1732454/1090657 for more details. – quantum Oct 20 '12 at 01:24
  • This link is in a previous answer, you even commented on it. Aside from going on a meaningless rant and asking a question, what new information does this answer provide? – Sam Oct 21 '12 at 17:27
  • @Sam More than your comment on a 2 year old question/answer! –  Oct 21 '12 at 17:39
  • You updated this about an hour ago... – Sam Oct 21 '12 at 17:41