7

Need to match the first part of a sentence, up to a given word. However, that word is optional, in which case I want to match the whole sentence. For example:

I have a sentence with a clause I don't want.

I have a sentence and I like it.

In the first case, I want "I have a sentence". In the second case, I want "I have a sentence and I like it."

Lookarounds will give me the first case, but as soon as I try to make it optional, to cover the second case, I get the whole first sentence. I've tried making the expression lazy... no dice.

The code that works for the first case:

var regEx = new Regex(@".*(?=with)");
string matchstr = @"I have a sentence with a clause I don't want";

if (regEx.IsMatch(matchstr)) {
    Console.WriteLine(regEx.Match(matchstr).Captures[0].Value);
    Console.WriteLine("Matched!");
}
else {
    Console.WriteLine("Not Matched : (");
}

The expression that I wish worked:

var regEx = new Regex(@".*(?=with)?");

Any suggestions?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
James King
  • 6,233
  • 5
  • 42
  • 63

3 Answers3

11

There are several ways to do this. You could do something like this:

^(.*?)(with|$)

The first group is matched reluctantly, i.e. as few characters as possible. We have an overall match if this group is followed by either with or the end of the line $ anchor.

Given this input:

I have a sentence with a clause I don't want.
I have a sentence and I like it.

Then there are two matches (as seen on rubular.com):

  • Match 1:
    • Group 1: "I have a sentence "
    • Group 2: "with"
  • Match 2:
    • Group 1: "I have a sentence and I like it".
    • Group 2: "" (empty string)

You can make the grouped alternation non-capturing with (?:with|$) if you don't need to distinguish the two cases.

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • You can of course use no capturing group, and use lookahead for the alternation part, i.e. `^.*?(?=with|$)` http://www.rubular.com/r/1JVjxdk30T ; these are minor variations of the same basic idea. – polygenelubricants Aug 27 '10 at 17:01
  • Beautiful. Used this with the non-capturing group, but (?:) still captured the group for some reason... `(?=with|$)`, however, did exactly what I needed it to do. Thanks! – James King Aug 27 '10 at 17:13
  • @James: there's a difference between non-capturing and assertion. Assertion doesn't consume as part of the match. Non-capturing doesn't mean non-matching. It's still matched, but it's not captured into a group. – polygenelubricants Aug 27 '10 at 17:18
  • Hmm, not sure I understand... I put `(.*?)(?:with|$)` into my code and got back one captured group: `I have a sentence with` Why is the word 'with' included in this capture? – James King Aug 27 '10 at 17:37
  • @James: I'm guessing you used `Captures[0]` when I meant `Groups[1]`. See http://stackoverflow.com/questions/3320823/whats-the-difference-between-groups-and-captures-in-net-regular-expressions – polygenelubricants Aug 27 '10 at 17:54
  • Yep, I did! And I knew better, too :P Though it still isn't clear to me why `Groups[0]` returns `I have a sentence with`, and why `Groups[1]` returns `I have a sentence` when I use `^(.*?)(?:with|$)` – James King Aug 27 '10 at 19:24
  • @James: because `Groups[0]` is the "default" group that returns the matched string. There's no explicit brackets needed to capture for group 0. Whatever you matched is what it will contain. Using `?:` creates no new group, but it's still a regular match, so it will be included in group 0. – polygenelubricants Aug 28 '10 at 00:55
1

If I understand your need correctly, you want to match either the sentence up to the word 'with', or, if it's not there, match the entire thing? Why not write the regexp to explicitly look for the two cases?

/(.*) with |(.*)/

Wouldn't this get both cases?

zigdon
  • 14,573
  • 6
  • 35
  • 54
1
string optional = "with a clause I don't want" 
string rx = "^(.*?)" + Regex.Escape(optional) + ".*$";

// displays "I have a sentence"
string foo = "I have a sentence with a clause I don't want.";
Console.WriteLine(Regex.Replace(foo, rx, "$1"));

// displays "I have a sentence and I like it."
string bar = "I have a sentence and I like it.";
Console.WriteLine(Regex.Replace(bar, rx, "$1"))

If you don't need the complex matching provided by a regex then you could use a combination of IndexOf and Remove. (And obviously you could abstract the logic away into a helper and/or extension method or similar):

string optional = "with a clause I don't want" 

// displays "I have a sentence"
string foo = "I have a sentence with a clause I don't want.";
int idxFoo = foo.IndexOf(optional);
Console.WriteLine(idxFoo < 0 ? foo : foo.Remove(idxFoo));

// displays "I have a sentence and I like it."
string bar = "I have a sentence and I like it.";
int idxBar = bar.IndexOf(optional);
Console.WriteLine(idxBar < 0 ? bar : bar.Remove(idxBar));
LukeH
  • 263,068
  • 57
  • 365
  • 409