0

If I have two values eg/ABC001 and ABC100 or A0B0C1 and A1B0C0, is there a RegEx I can use to make sure the two values have the same pattern?

Jon
  • 38,814
  • 81
  • 233
  • 382
  • 2
    Can you give us more examples or explain the pattern better? – JaredPar Dec 09 '10 at 15:46
  • 1
    Do you know the pattern in advance? Is the pattern constant? Or do you want to be able to match them, if they're the same "pattern" even if you haven't seen that pattern before? – Andrew M Dec 09 '10 at 15:47
  • 2
    What defines the "same pattern"? Do you mean that they have a digits at the sample places in both strings, and letters at the same places in both strings? So `AA1` has the same "pattern" as `AA0` but not `A1A`? A little more clarification would be helpful. – Donut Dec 09 '10 at 15:47
  • The issue is I'm not sure what the pattern is. It could be different. I have 2 values that contain alpanumeric characters and I want to make sure the first value has the same pattern as the second. – Jon Dec 09 '10 at 15:47
  • Do you mean checking if a string is a permutation of another string? – Ani Dec 09 '10 at 15:48
  • 3
    What do you mean by "same pattern"? I can come up with dozens of patterns which will match any of these strings. – Yodan Tauber Dec 09 '10 at 15:49
  • @andrewm Pattern is not constant and I want to match both even if not seen before. – Jon Dec 09 '10 at 15:49
  • @donut You got it, AA1 and AA0 are the same pattern therefore they match – Jon Dec 09 '10 at 15:50
  • @Jon: Hang on; if I'm right that we are talking about permutations, how do AA0 and AA1 have the same pattern? – Ani Dec 09 '10 at 15:52
  • @Jon See my updated method, does that help? – Donut Dec 09 '10 at 16:14
  • How can @donut and @Ani both be right? donut said `AA0=BB1`, and Ani said `AA10=A0A1`. – Kobi Dec 09 '10 at 16:15

5 Answers5

2

If you don't know the pattern in advance, but are only going to encounter two groups of characters (alpha and digits), then you could do the following:

Write some C# that parsed the first pattern, looking at each char and determine if it's alpha, or digit, then generate a regex accordingly from that pattern.

You may find that there's no point writing code to generate a regex, as it could be just as simple to check the second string against the first.

Alternatively, without regex:

First check the strings are the same length. Then loop through both strings at the same time, char by char. If char[x] from string 1 is alpha, and char[x] from string two is the same, you're patterns are matching.

Try this, it should cope if a string sneaks in some symbols. Edited to compare character values ... and use Char.IsLetter and Char.IsDigit

private bool matchPattern(string string1, string string2)
{
    bool result = (string1.Length == string2.Length);
    char[] chars1 = string1.ToCharArray();
    char[] chars2 = string2.ToCharArray();

    for (int i = 0; i < string1.Length; i++)
    {
        if (Char.IsLetter(chars1[i]) != Char.IsLetter(chars2[i]))
        {
            result = false;
        }
        if (Char.IsLetter(chars1[i]) && (chars1[i] != chars2[i]))
        {   
            //Characters must be identical
            result = false;
        }
        if (Char.IsDigit(chars1[i]) != Char.IsDigit(chars2[i]))
            result = false;
    }
    return result;
}
Andrew M
  • 9,149
  • 6
  • 44
  • 63
2

Well, here's my shot at it. This doesn't use regular expressions, and assumes s1 and s2 only contain numbers or digits:

public static bool SamePattern(string s1, string s2)
{
   if (s1.Length == s2.Length)
   {
      char[] chars1 = s1.ToCharArray();
      char[] chars2 = s2.ToCharArray();

      for (int i = 0; i < chars1.Length; i++)
      {
         if (!Char.IsDigit(chars1[i]) && chars1[i] != chars2[i])
         {
            return false;
         }
         else if (Char.IsDigit(chars1[i]) != Char.IsDigit(chars2[i]))
         {
            return false;
         }
      }

      return true;
   }
   else
   {
      return false;
   }
}

A description of the algorithm is as follows:

  1. If the strings have different lengths, return false.
  2. Otherwise, check the characters in the same position in both strings:
    1. If they are both digits or both numbers, move on to the next iteration.
    2. If they aren't digits but aren't the same, return false.
    3. If one is a digit and one is a number, return false.
  3. If all characters in both strings were checked successfully, return true.
Donut
  • 110,061
  • 20
  • 134
  • 146
  • The problem is if you test SamePattern("EFG001", "ABC002"); the result is true but I want it to return false as the letters are different – Jon Dec 09 '10 at 16:05
  • @jon string1 == string2; I think we need a more detailed description of your rules of what a matching pattern is. Do the letters need to be the same, but the numbers may change? – Andrew M Dec 09 '10 at 16:08
  • Letters need to be the same but the numbers can change. I have a feeling I can't see the wood for the trees. – Jon Dec 09 '10 at 16:11
  • Ok, I've updated it. This should work for you, `SamePattern("ABC001", "ABC002")` returns true while `SamePattern("EFG001", "ABC002")` returns false. – Donut Dec 09 '10 at 16:14
  • 1
    If I may, you can greatly improve readability by using `c1 >= '0'`, or `Char.IsDigit`. – Kobi Dec 09 '10 at 16:17
  • @Kobi Good call! Updated, got rid of the need for the c1 and c2 variables. – Donut Dec 09 '10 at 16:22
1

Consider using Char.GetUnicodeCategory
You can write a helper class for this task:

public class Mask
{
    public Mask(string originalString)
    {
        OriginalString = originalString;
        CharCategories = originalString.Select(Char.GetUnicodeCategory).ToList();
    }

    public string OriginalString { get; private set; }
    public IEnumerable<UnicodeCategory> CharCategories { get; private set; }

    public bool HasSameCharCategories(Mask other)
    {
        //null checks
        return CharCategories.SequenceEqual(other.CharCategories);
    }
}

Use as

Mask mask1 = new Mask("ab12c3");
Mask mask2 = new Mask("ds124d");
MessageBox.Show(mask1.HasSameCharCategories(mask2).ToString());
Kobi
  • 135,331
  • 41
  • 252
  • 292
  • I haven't run this but I would expect the result to be false as although the pattern matches the letters used are different. – Jon Dec 09 '10 at 16:02
  • +1 for GetUnicodeCategory, although I confess I found the code hard to follow, I think the addition of a bit of explicit typing would help? Such as List CharCategories – Andrew M Dec 09 '10 at 16:07
  • @Jon - than I've misunderstood your question completely, and may delete the answer promptly. So `AA0` and `AA1` are the same, but `AA0` and `BB0` are not? – Kobi Dec 09 '10 at 16:09
  • @Andrew - It isn't my best work, I see that `:)` Just a quick demo. – Kobi Dec 09 '10 at 16:09
0

I don't know C# syntax but here is a pseudo code:

  • split the strings on ''
  • sort the 2 arrays
  • join each arrays with ''
  • compare the 2 strings
Toto
  • 89,455
  • 62
  • 89
  • 125
0

A general-purpose solution with LINQ can be achieved quite easily. The idea is:

  1. Sort the two strings (reordering the characters).
  2. Compare each sorted string as a character sequence using SequenceEquals.

This scheme enables a short, graceful and configurable solution, for example:

// We will be using this in SequenceEquals
class MyComparer : IEqualityComparer<char>
{
    public bool Equals(char x, char y)
    {
        return x.Equals(y);
    }

    public int GetHashCode(char obj)
    {
        return obj.GetHashCode();
    }
}

// and then:
var s1 = "ABC0102";
var s2 = "AC201B0";

Func<char, double> orderFunction = char.GetNumericValue;
var comparer = new MyComparer();
var result = s1.OrderBy(orderFunction).SequenceEqual(s2.OrderBy(orderFunction), comparer);

Console.WriteLine("result = " + result);

As you can see, it's all in 3 lines of code (not counting the comparer class). It's also very very easily configurable.

  • The code as it stands checks if s1 is a permutation of s2.
  • Do you want to check if s1 has the same number and kind of characters with s2, but not necessarily the same characters (e.g. "ABC" to be equal to "ABB")? No problem, change MyComparer.Equals to return char.GetUnicodeCategory(x).Equals(char.GetUnicodeCategory(y));.
  • By changing the values of orderFunction and comparer you can configure a multitude of other comparison options.

And finally, since I don't find it very elegant to define a MyComparer class just to enable this scenario, you can also use the technique described in this question:

Wrap a delegate in an IEqualityComparer

to define your comparer as an inline lambda. This would result in a configurable solution contained in 2-3 lines of code.

Community
  • 1
  • 1
Jon
  • 428,835
  • 81
  • 738
  • 806