573

I am looking for a pattern that matches everything until the first occurrence of a specific character, say a ";" - a semicolon.

I wrote this:

/^(.*);/

But it actually matches everything (including the semicolon) until the last occurrence of a semicolon.

Leon Fedotov
  • 7,421
  • 5
  • 30
  • 33

15 Answers15

752

You need

/^[^;]*/

The [^;] is a character class, it matches everything but a semicolon.

^ (start of line anchor) is added to the beginning of the regex so only the first match on each line is captured. This may or may not be required, depending on whether possible subsequent matches are desired.

To cite the perlre manpage:

You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.

This should work in most regex dialects.

sleske
  • 81,358
  • 34
  • 189
  • 227
  • The great part about this solution is that also matches end of the line, e.g. in my case I had `foo=bar;baz=bax;bab=baf` and it matched `bab=baf` even there is no `;` Exactly what I need. Not sure why it works though if spec says matches everything but the target symbol... – skryvets Dec 16 '19 at 21:22
444

Would;

/^(.*?);/

work?

The ? is a lazy operator, so the regex grabs as little as possible before matching the ;.

Mosh Feu
  • 28,354
  • 16
  • 88
  • 135
RJFalconer
  • 10,890
  • 5
  • 51
  • 66
  • 4
    ya, but following the bicarbonate extension to Tim Toady, I believe negated character classes win as lazy quantifier includes backtraking. +1 anyway. – Amarghosh Jan 06 '10 at 13:40
  • 3
    Worth reading on the performance topic: http://blog.stevenlevithan.com/archives/greedy-lazy-performance – Glenn Slaven Jan 06 '10 at 13:45
52

/^[^;]*/

The [^;] says match anything except a semicolon. The square brackets are a set matching operator, it's essentially, match any character in this set of characters, the ^ at the start makes it an inverse match, so match anything not in this set.

Glenn Slaven
  • 33,720
  • 26
  • 113
  • 165
  • 3
    Be aware that the first ^ in this answer gives the regex a completely different meaning: It makes the regular expression look only for matches starting from the beginning of the string. In this case, that would effectively be a no-op *if* you run the regular expression only once. If you want to look for multiple matches within a single string, the first ^ would have to go. – Dan Breslau Jan 06 '10 at 13:48
  • 4
    He did say that he wanted to match everything until the first occurrence of a semicolon, so I assumed that he meant from the start of the string. – Glenn Slaven Jan 06 '10 at 13:58
24

None of the proposed answers did work for me. (e.g. in notepad++) But

^.*?(?=\;)

did.

Lonzak
  • 9,334
  • 5
  • 57
  • 88
21

Try /[^;]*/

Google regex character classes for details.

Dan Breslau
  • 11,472
  • 2
  • 35
  • 44
13

sample text:

"this is a test sentence; to prove this regex; that is g;iven below"

If for example we have the sample text above, the regex /(.*?\;)/ will give you everything until the first occurence of semicolon (;), including the semicolon: "this is a test sentence;"

Aliaksei Kliuchnikau
  • 13,589
  • 4
  • 59
  • 72
poncius
  • 131
  • 1
  • 2
  • 3
    it is not necessary to escape `;` char becaut it is not regex special character. Grouping `()` is not required as well. You can go with `/.*?;/` – Aliaksei Kliuchnikau Jan 20 '12 at 13:30
  • 1
    yes, you are quite right. the escaping was more like "better safe than sorry" – poncius Jan 20 '12 at 14:24
  • 2
    This is the answer I was looking for. So the ? makes the match end on the first occurence? What's the name of this... (let's call it) property of the regex? – Parziphal Jun 22 '12 at 21:11
  • 2
    @Parziphal the `?` character makes the match **lazy** (matching as few times as possible). Think of the regex matching characters up until the first semicolon then it doesn't go any farther because it gives up (lazy ;) ) – derekantrican Jul 23 '19 at 14:42
12

Try /[^;]*/

That's a negating character class.

Skilldrick
  • 69,215
  • 34
  • 177
  • 229
7

This was very helpful for me as I was trying to figure out how to match all the characters in an xml tag including attributes. I was running into the "matches everything to the end" problem with:

/<simpleChoice.*>/

but was able to resolve the issue with:

/<simpleChoice[^>]*>/

after reading this post. Thanks all.

Yardboy
  • 2,777
  • 1
  • 23
  • 29
  • 1
    I had found that it is way more efficient to actually parse(each language or framework has its own classes for that) html/xml because of it's machine format, regex's are for natural language. – Leon Fedotov Feb 06 '11 at 11:15
  • 1
    Nice. I used this to fix xml documents with syntax errors in ` ` tag. Since parser wasn't able to handle it. – Martin Schneider Jul 03 '17 at 12:22
5

this is not a regex solution, but something simple enough for your problem description. Just split your string and get the first item from your array.

$str = "match everything until first ; blah ; blah end ";
$s = explode(";",$str,2);
print $s[0];

output

$ php test.php
match everything until first
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
5

This will match up to the first occurrence only in each string and will ignore subsequent occurrences.

/^([^;]*);*/
mchid
  • 2,699
  • 1
  • 14
  • 12
3

"/^([^\/]*)\/$/" worked for me, to get only top "folders" from an array like:

a/   <- this
a/b/
c/   <- this
c/d/
/d/e/
f/   <- this
sPooKee
  • 31
  • 1
2

Really kinda sad that no one has given you the correct answer....

In regex, ? makes it non greedy. By default regex will match as much as it can (greedy)

Simply add a ? and it will be non-greedy and match as little as possible!

Good luck, hope that helps.

L1amm
  • 47
  • 4
  • 4
    This heavily depends on the actual regex **implementation** and not every implementation has non-greedy mode. – karatedog Jul 13 '15 at 15:24
2

This works for getting the content from the beginning of a line till the first word,

/^.*?([^\s]+)/gm
Stranger
  • 10,332
  • 18
  • 78
  • 115
2

All the answers above match a string if it does not contain the character.

If you want to have match only if the character exists (and no match otherwise), you should use this regex:

/^(.*?);/
Aerodynamika
  • 7,883
  • 16
  • 78
  • 137
0

I faced a similar problem including all the characters until the first comma after the word entity_id. The solution that worked was this in Bigquery:

SELECT regexp_extract(line_items,r'entity_id*[^,]*') 
Ethan
  • 876
  • 8
  • 18
  • 34