1

I have a list of names here

apple.fruit
appleOrder2.fruit
orange.fruit

I want to extrat just the fruit name

expected

apple
apple
orange

I have the regex (.*)((Order)|(\.fruit))

that returns at position one,

apple
appleOrder2
orange

I think the \. is messing up with Alternation character because when i did a test using

(.*)((Order)|(ge))

the alternation works fine returning at position 1

empty
apple
oran

Perl is being used

freshWoWer
  • 61,969
  • 10
  • 36
  • 35

5 Answers5

1

.* is just too greedy for your regex. Try:

(.+?)(?:Order2)?\.fruit
Linus Kleen
  • 33,871
  • 11
  • 91
  • 99
  • @Justin No, why? `.+` as in "one or more characters". Which is also not as greedy as `.*`. – Linus Kleen Jan 25 '11 at 21:16
  • @gore, See my question here. I don't think you want to be using the term 'greedy.' http://stackoverflow.com/questions/1139171/when-it-comes-to-regex-what-is-the-difference-between-greedy-and-reluctant-q – jjnguy Jan 25 '11 at 21:17
  • @goreSplatter: Both are just as greedy. – Tim Pietzcker Jan 25 '11 at 21:18
  • Thanks! could you explain what does (?:Order2)? do? you took out the alternation character too is that a typo or you meant that? I guess the ? is to tell regex engine to do a lazy search, but what is the : and the second ? – freshWoWer Jan 25 '11 at 21:26
  • @freshWoWer `(?: ... )` means grouping, but not capturing. The question mark behind that means "one match or zero". So the whole grouping of "Order2" is matched either once or not at all. See [Justin's previous answer](http://stackoverflow.com/questions/1139171/when-it-comes-to-regex-what-is-the-difference-between-greedy-and-reluctant-q) for details on that. @Justin [perlre](http://www.ryerson.ca/perl/manual/pod/perlre.html) uses "greedy". So do I. – Linus Kleen Jan 25 '11 at 21:29
1

Use a lazy quantifier:

(.*?)(Order|\.fruit)

In your regex, the .* first matches the entire string, then backtracks one character at a time until the alternation Order|\.fruit matches. Since that's the case after six backtracks already, the regex engine never gets to the point where it might find the other, earlier alternative. Solution: Tell the regex engine to match as few characters as possible by adding a ? to the quantifier.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
0

(.*[!?^Order]|[!?^.fruit])((Order[0-9])|(.fruit)|((Order[0-9])|(.fruit)))

Senad Meškin
  • 13,597
  • 4
  • 37
  • 55
0

In your original expression:

(.*)((Order)|(\.fruit))

the (Order) group is insufficient to match the "Order2" component of your second example string. I think something like e.g.:

(.*)((Order[0-9]?)|(\.fruit))

or similar would be able to also match the trailing integer (assuming it's not always "Order2").

Peter Briggs
  • 126
  • 1
  • 2
0

try ^([a-z])(\n) [ asterix after(\n), and after[a-z] ], it takes any characters from each line. More detailed : "^([a-zA-Z0-9])(\n)"

ka_lin
  • 9,329
  • 6
  • 35
  • 56