26

I am new to regular expression and this may be a very easy question (hopefully).

I am trying to use one solution for 3 kinds of string

  • "45%", expected result: "45"
  • "45", expected result: "45"
  • "", expected result: ""

What I am trying (let the string be str):

str.match(/(.*)(?!%*)/i)[1]

This is in my head would sound like "match any instance of anything up until '%' if it is found, or else just match anything"

In firebug's head, it seems to sound more like "just match anything and completely disregard the negative lookahead". Also to make it lazy - (.*)? - doesn't seem to help.

Let's forget for a second that in this specific situation I am only matching numbers, so a /\d*/ would do. I am trying to understand a general rule so that I can apply it whenever.

Anybody would be so kind to help me out?

turivishal
  • 34,368
  • 7
  • 36
  • 59
undefinederror
  • 821
  • 1
  • 8
  • 16
  • 1
    The negative lookahead:`(?!%*)` says: _"assert that zero or more percent signs do not follow"_ This assertion can never be true because `%*` is always true! (`%*` matches nothing at all - which is _always_ true everywhere - even for an empty string.) – ridgerunner Dec 21 '11 at 03:47

5 Answers5

42

How about the simpler

str.match(/[^%]*/i)[0]

Which means, match zero-or-more character, which is not a %.


Edit: If need to parse until </a>, then you could parse a sequence pf characters, followed by </a>, then then discard the </a>, which means you should use positive look-ahead instead of negative.

str.match(/.*?(?=<\/a>|$)/i)[0]

This means: match zero-or-more character lazily, until reaching a </a> or end of string.

Note that *? is a single operator, (.*)? is not the same as .*?.

(And don't parse HTML with a single regex, as usual.)

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • In Regular Expressions, particularly the JavaScript flavor, the `^`character means to match starting from the beginning of the reference string. https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions – austincheney Dec 21 '11 at 03:29
  • @austincheney: That is true when `^` is used as a text anchor, but `^` has a different meaning when used within a character class, i.e. it negates the match ("anything but these chars..."). – bobbymcr Dec 21 '11 at 03:31
  • Thank you Kenny, that works. But what if "%" was ""? I would like to exclude a pattern more than a single character. And just to make it clearer, "" (or any pattern) might or might not be there. – undefinederror Dec 21 '11 at 03:35
  • Thank you Kenny, that is exactly what I was hoping to find. See my comment to Alan. Also I really appreciated that you took the time to explain it bit by bit... and Merry Christmas! – undefinederror Dec 25 '11 at 22:47
9

I think this is what you're looking for:

/(?:(?!%).)*/

The . matches any character, but only after the negative lookahead, (?!%), confirms that the character is not %. Note that when the sentinel is a single character like %, you can use a negated character class instead, for example:

/[^%]*/

But for a multi-character sentinel like </a>, you have to use the lookahead approach:

/(?:(?!</a>).)*/i

This is actually saying "Match zero or more characters one at a time, but if the next character turns out to be the beginning of the sequence </a> or </A>, stop without consuming it".

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • This is great. Exactly what I was looking for. Thanks to you and Kenny now I know how to make one step at a time making sure my last step does not fall on a certain pattern `/((?!pattern).)*/` and how to make a long walk until the next character is the beginning of my pattern, or the end of a string `/.*?(?=pattern|$)/`. I think Kenny's is more what I was expecting to find, while yours is less obvious and definitely brilliant. I don't think I would have ever thought of it. Thank you! – undefinederror Dec 25 '11 at 22:44
3

The easiest way with an exact search string is to skip regular expressions and just use indexOf, e.g.:

// String to be searched
var s = "Here is a <a>link</a>."

// String to find
var searchString = "</a>";

// Final match
var matched = "";

var c = s.indexOf(searchString);
if (c >= 0)
{
    // Returns the portion not including the search string;
    // in this example, "Here is a <a>link". If you want the
    // search string included, add the length of the search
    // string to c.
    matched = s.substring(c);
}
bobbymcr
  • 23,769
  • 3
  • 56
  • 67
  • 2
    Thank you bobby, but I was looking for a RegEx solution. What you described is what I would normally do, but in doing so I would end up reiterating this little vocabulary of mine.. – undefinederror Dec 25 '11 at 22:17
1

I just wrote it exactly how you said it:

str.match(/(^[^%]*$)|^([^%]*)%.*/i)

This will match any string without a '%' or the first part of a string that contains a %. You have to get the result from the 1st or 2nd group.

EDIT: This is exactly what you want below

str.match(/(?:^[^%]*$)|^(?:[^%]*)(?=%)/)
  • The ?: removes all grouping
  • The ?= is a lookahead to see if the string contains %
  • and [^%] matches any character that is not a %

so the regex reads match any string that doesnt contain %, OR (otherwise match) all of the characters before the first %

jermel
  • 2,326
  • 21
  • 19
0

to match 45, 45%, and any number of any length use this (182%, 18242, etc)

str.match(/([0-9]+)([%]?)/)[1];

if you need to match the empty string also include it as ^$, note match("...")[1] will be undefined for the empty string, so you will need to test for match and then check [0] or see if [1] is undefined.

str.match(/([0-9]+)([%]?)|^$/)

if you need to match exactly two digits use {2,2} anchor the expression between begin and end line characters: "^(exp)$"

str.match(/^([0-9]{2,2})([%]?)$/)[1];
Richard Logwood
  • 3,163
  • 23
  • 19