5

I need a regular expression so that if I search for ">" greater than.

for example

for this string I will get true - "if x> 2"

and for this string I will get false "<template>"

I have tried this - [^<][a-zA-Z0-9_]+[a-zA-Z0-9_ ]*> as the regular expression but the problem is that it finds a substring that match for example in <template> it finds template> and return true.

thanks.

EDIT:

I am using this regular expression [^<a-zA-Z0-9_][a-zA-Z0-9_]+[ ]*> tried it over the entire firefox 1.0 source code and it seem to be working fine.

Naman
  • 1,519
  • 18
  • 32
yossi
  • 12,945
  • 28
  • 84
  • 110

2 Answers2

5

It sounds like you want to match lines that contain > but not <. This pattern will do that:

/^(?=.*>)[^<]+$/

However, I'm curious why you want to do this. It sounds suspiciously like you're trying to parse HTML with regular expressions, which is usually A Bad Idea.

EDIT:

It's clearer now what you're trying to do, but you should be aware that this pushes the limits of what regular expressions are capable of. They can't really tell the difference between a template declaration and text with angle brackets in it, but if you know your template declarations all match a very specific pattern, you can do a pretty good job of catching them.

If all your template declarations follow the <[0-9]+template> pattern, you can do this:

/^.*(?<!<\d+template)>.*$/

If your templates don't follow such a strict convention, you need a true C++ parser for this. It will be basically impossible for regex to tell the difference between a template declaration and this:

a=b<c>d;

...which is valid code in C++ (translating, I believe, to a = (b < c) > d;).

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • 1
    i am trying to find greater than occurrences in a c source code but i dont want to find – yossi Jun 14 '11 at 18:54
  • 1
    @yossi - Okay. This regex should do that unless the template declaration spans multiple lines. This should be a simple enough task for regex to fit your needs 99.9% of the time, but if you need a *really* robust solution, you're going to need a context-aware parser (i.e. something that can actually understand C code). – Justin Morgan - On strike Jun 14 '11 at 18:58
  • that didnt work, its not that i want "to match lines that contain > but not <. " because i should find things like this "x<2 and y>3" – yossi Jun 14 '11 at 19:06
  • @yossi - Do all your templates follow the `<_template>` naming convention? Do they all have the word "template" in the declaration? I haven't done C++ for a while; can there be nested brackets (e.g. `<1template<2template>>`)? – Justin Morgan - On strike Jun 14 '11 at 19:20
  • no they dont all have the word "tmeplate" in them. about the nesting of brackets i dont know. – yossi Jun 14 '11 at 19:24
  • 1
    @yossi - In that case, you can save some time with regex, but what you really need is a C++ parser or an IDE with a "Find All References"-type feature. See edit. – Justin Morgan - On strike Jun 14 '11 at 19:29
3

A regex seems like the wrong tool for the job you're trying to do. You'll probably require a full-blown C++ parser to reliably distinguish ">" the greater-than operator from ">" the template delimiter, or ">" as part of a string literal or comment.

Jim Lewis
  • 43,505
  • 7
  • 82
  • 96