8

I have a string

foo-bar-bat.bla

I wish to match only foo

My flawed pattern matches both foo and bar

\w+(?=-.*\.bla)

How do I discard bar? Or maybe even better, how could I stop matching stuff after foo?

johnnyRose
  • 7,310
  • 17
  • 40
  • 61
wawiwa
  • 313
  • 1
  • 5
  • 9
  • 1
    You can stop matching after the first match (either with `Matcher` or `replaceFirst`), or do I miss anything? – nhahtdh Mar 13 '13 at 17:51
  • My comment earlier was about Java, but I think there are equivalent construct in other languages to stop at first match. The only case that my comment does not apply is when you are using some kind of tool. But there are always trick to work-around, if you give more example and context. – nhahtdh Mar 13 '13 at 18:01
  • 1
    What is the relation between foo and bar? Does bar need to be present? – Lodewijk Bogaards Mar 13 '13 at 18:01
  • 1
    How much does your input string vary? Is it always going to be three chars, dash, three chars, dash, three chars, dot, 3 chars? – spots Mar 13 '13 at 18:20
  • Yes, it's always going to have the same format. I tried testing Hugo's regex (http://www.pythonregex.com) by prepending a caret: ^\w+(?=-.*\.bla) Seems like that should work but in pythonregex.com it produced no results. I used the following test data: asf.asf-asf.bla bla-bla-boo.bla foo-bar-bat.bla Without the caret the test produces: >>> regex.findall(string) [u'asf', u'bla', u'bla', u'foo', u'bar'] – wawiwa Mar 14 '13 at 19:41

3 Answers3

9

You could use the following pattern (as long as your strings are always formatted the way you said) :

^\w+(?=-.*\.bla)

Regular expression image

Edit live on Debuggex

The ^ sign matches the beginning of the string. And thus will take the very first match of the string.

The ?= is meant to make sure the group following is not captured but is present.

Community
  • 1
  • 1
Hugo Dozois
  • 8,147
  • 12
  • 54
  • 58
  • I am not sure about the intention, but it will not match anything in the case `[foo]bar-r.bla`, while the regex in the question will match the example I give. – nhahtdh Mar 13 '13 at 17:53
  • @nhahtdh That's is true but based on the only example we have what I said would work. Though as you said it depends of the case. – Hugo Dozois Mar 13 '13 at 17:54
  • This pattern works on my python 2.7 web framework. Maybe I'm using www.pythonregex.com incorrectly but for some reason I can't get any results when I prepend a caret. – wawiwa Mar 14 '13 at 20:06
  • Well, the caret is a pretty standard character. Maybe the python version is too old on the site though – Hugo Dozois Mar 15 '13 at 07:28
0
^[^-]+

The starting ^ means to start matching from the beginning of the string. The charactergroup [^-] means to search for anything that is not a dash. The + means that the charactergroup should be match a character one or multiple times.

Lodewijk Bogaards
  • 19,777
  • 3
  • 28
  • 52
0

The ".*" part of your expression matches "bar."

^\w+(?=-.*)

This expression reads as "At the start of a string, at least one character followed by (but not includeded in the match) a DASH followed by anything"

^ \w+ (?=-.*)
|  |    |
|  |   matches "-bar-bat.bla"
|  matches "foo"
start of string
spots
  • 2,483
  • 5
  • 23
  • 38