Javascript regular expression: match anything up until something (if there it exists)

Question

I am new to regular expression and this may be a very easy question (hopefully).

I am trying to use one solution for 3 kinds of string

"45%", expected result: "45"
"45", expected result: "45"
"", expected result: ""

What I am trying (let the string be str):

str.match(/(.*)(?!%*)/i)[1]

This is in my head would sound like "match any instance of anything up until '%' if it is found, or else just match anything"

In firebug's head, it seems to sound more like "just match anything and completely disregard the negative lookahead". Also to make it lazy - (.*)? - doesn't seem to help.

Let's forget for a second that in this specific situation I am only matching numbers, so a /\d*/ would do. I am trying to understand a general rule so that I can apply it whenever.

Anybody would be so kind to help me out?

The negative lookahead:`(?!%*)` says: _"assert that zero or more percent signs do not follow"_ This assertion can never be true because `%*` is always true! (`%*` matches nothing at all - which is _always_ true everywhere - even for an empty string.) — ridgerunner, Dec 21 '11 at 03:47

score 42 · Accepted Answer · edited May 23 '17 at 12:09

42

How about the simpler

str.match(/[^%]*/i)[0]

Which means, match zero-or-more character, which is not a %.

Edit: If need to parse until </a>, then you could parse a sequence pf characters, followed by </a>, then then discard the </a>, which means you should use positive look-ahead instead of negative.

str.match(/.*?(?=<\/a>|$)/i)[0]

This means: match zero-or-more character lazily, until reaching a </a> or end of string.

Note that *? is a single operator, (.*)? is not the same as .*?.

(And don't parse HTML with a single regex, as usual.)

edited May 23 '17 at 12:09

Community

1
1

answered Dec 21 '11 at 03:23

kennytm

510,854
105
1,084
1,005

In Regular Expressions, particularly the JavaScript flavor, the `^`character means to match starting from the beginning of the reference string. https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions – austincheney Dec 21 '11 at 03:29
@austincheney: That is true when `^` is used as a text anchor, but `^` has a different meaning when used within a character class, i.e. it negates the match ("anything but these chars..."). – bobbymcr Dec 21 '11 at 03:31
Thank you Kenny, that works. But what if "%" was ""? I would like to exclude a pattern more than a single character. And just to make it clearer, "" (or any pattern) might or might not be there. – undefinederror Dec 21 '11 at 03:35
Thank you Kenny, that is exactly what I was hoping to find. See my comment to Alan. Also I really appreciated that you took the time to explain it bit by bit... and Merry Christmas! – undefinederror Dec 25 '11 at 22:47

score 9 · Answer 2 · answered Dec 21 '11 at 04:46

9

I think this is what you're looking for:

/(?:(?!%).)*/

The . matches any character, but only after the negative lookahead, (?!%), confirms that the character is not %. Note that when the sentinel is a single character like %, you can use a negated character class instead, for example:

/[^%]*/

But for a multi-character sentinel like </a>, you have to use the lookahead approach:

/(?:(?!</a>).)*/i

This is actually saying "Match zero or more characters one at a time, but if the next character turns out to be the beginning of the sequence </a> or </A>, stop without consuming it".

answered Dec 21 '11 at 04:46

Alan Moore

73,866
12
100
156

This is great. Exactly what I was looking for. Thanks to you and Kenny now I know how to make one step at a time making sure my last step does not fall on a certain pattern `/((?!pattern).)*/` and how to make a long walk until the next character is the beginning of my pattern, or the end of a string `/.*?(?=pattern|$)/`. I think Kenny's is more what I was expecting to find, while yours is less obvious and definitely brilliant. I don't think I would have ever thought of it. Thank you! – undefinederror Dec 25 '11 at 22:44

score 3 · Answer 3 · answered Dec 21 '11 at 03:44

The easiest way with an exact search string is to skip regular expressions and just use indexOf, e.g.:

// String to be searched
var s = "Here is a <a>link</a>."

// String to find
var searchString = "</a>";

// Final match
var matched = "";

var c = s.indexOf(searchString);
if (c >= 0)
{
    // Returns the portion not including the search string;
    // in this example, "Here is a <a>link". If you want the
    // search string included, add the length of the search
    // string to c.
    matched = s.substring(c);
}

Thank you bobby, but I was looking for a RegEx solution. What you described is what I would normally do, but in doing so I would end up reiterating this little vocabulary of mine.. — undefinederror, Dec 25 '11 at 22:17

jermel · Answer 4 · 2011-12-21T04:24:52.790

I just wrote it exactly how you said it:

str.match(/(^[^%]*$)|^([^%]*)%.*/i)

This will match any string without a '%' or the first part of a string that contains a %. You have to get the result from the 1st or 2nd group.

EDIT: This is exactly what you want below

str.match(/(?:^[^%]*$)|^(?:[^%]*)(?=%)/)

The ?: removes all grouping
The ?= is a lookahead to see if the string contains %
and [^%] matches any character that is not a %

so the regex reads match any string that doesnt contain %, OR (otherwise match) all of the characters before the first %

score 0 · Answer 5 · answered Dec 21 '11 at 04:21

to match 45, 45%, and any number of any length use this (182%, 18242, etc)

str.match(/([0-9]+)([%]?)/)[1];

if you need to match the empty string also include it as ^$, note match("...")[1] will be undefined for the empty string, so you will need to test for match and then check [0] or see if [1] is undefined.

str.match(/([0-9]+)([%]?)|^$/)

if you need to match exactly two digits use {2,2} anchor the expression between begin and end line characters: "^(exp)$"

str.match(/^([0-9]{2,2})([%]?)$/)[1];

Javascript regular expression: match anything up until something (if there it exists)

5 Answers5

Linked