0

Is there a posibility to write a regular expresion to match a "c" or a "ç" to work for both examples like

var a = "ca va";
var b = "ça va";
Regex.Match(a,"\b(ca\sva)").Success // Match
Regex.Match(b,"\b(ça\sva)").Success // Dont match

Thanks

Constantin
  • 2,288
  • 2
  • 24
  • 31
  • 1
    your code works correctly( after adding `@` to avoid compilation errors) – L.B May 04 '12 at 16:34
  • I've got 2 matches. And you're missing @ symbol before string constants in your regexes. – empi May 04 '12 at 16:34
  • @L.B.: Not just compilation errors. Without the `@`, `\b` means "match a backspace character". – Tim Pietzcker May 04 '12 at 16:35
  • the problem is not in compilation, the problem is in matching. i want to match this "ça va" in both cases when users type "ca va" or "ça va" – Constantin May 04 '12 at 16:36
  • 2
    @Constantine then use `@"\b([çc]a\sva)"` – L.B May 04 '12 at 16:38
  • Thanks @L.B, it was because of that @ :D, and because of RegExRX app where i validate the matches before using it in C# – Constantin May 04 '12 at 16:38
  • How about transfer the special letter to English letter first? There's the idea http://stackoverflow.com/a/331321/1008230 – fankt May 04 '12 at 16:46
  • I suggest you copy subject text into a temporary string, substitute each French character (`c` for `ç`, `ae` for `æ`, `e` for `è` and so on) for the English one, and then do your Regex match. If you need to match more than a handful of such phrases, you will quickly see that making each regex "general" like this will get out of hand. (Imagine writing 20 regexes, each with 3 variable characters like this.) – Superbest May 04 '12 at 19:06

1 Answers1

2

For me, the following code returns true in either case:

using System;
using System.Text.RegularExpressions;

namespace FrenchRegex
{
    class Program
    {
        static void Main(string[] args)
        {
            var a = "ca va";
            var b = "ça va";

            var regex = @"\b((c|ç)a\sva)";

            var matchA = Regex.Match(a, regex).Success;
            var matchB = Regex.Match(b, regex).Success;

            Console.WriteLine("Matches '" + a + "': " + matchA);
            Console.WriteLine("Matches '" + b + "': " + matchB);

            Console.ReadKey();
        }
    }
}

I copied and pasted into VS2010, so you might need to do the same to reproduce my result.

In any case, I think a regex that matches both "ça va" and "ca va" would be \b([cç]a\sva).

Superbest
  • 25,318
  • 14
  • 62
  • 134
  • I'm not sure that's better. Not sure how the regex parser works under the hood, but I would think a character class ([]) would require more overhead, even if a small amount, than a group of literals separated by the pipe. – Brian Warshaw May 04 '12 at 19:01
  • That's right, thanks! @BrianWarshaw I doubt in this case performance is more of a concern than readability, plus this way I don't introduce more backreference groups than the original version had. – Superbest May 04 '12 at 19:02
  • I don't know . . . I think it's generally a good idea to keep them lean--better not to be in the habit of writing less-efficient expressions if you can help it. But that's my opinion, and I suppose I can stick it in my ear :-) – Brian Warshaw May 04 '12 at 19:05
  • Well, for the sake of completeness: @BrianWarshaw is I believe suggesting `\b((c|ç)a\sva)`, which also works with the above code. – Superbest May 04 '12 at 19:08