1

I have a list of objects: e.g. A, A, B, C, D, E, E

And I have defined templates that tell how object types can be grouped e.g.

 Group Alpha --> A 1..n --> any number of 'A's can be grouped
 Group Charlie --> Sequences of 'BCD' can be grouped
 Group Epsilon --> E 1..n --> any number of 'E's can be grouped

Now I want to apply those group definitions on the original list, which should give the result:

 Group Alpha (2x'A'), Group Charlie (1x'BCD'), Group Epsilon (2x'E')

How can this be best achieved? Is there a known search algorithm/pattern for my issue? I have tried a very basic way of looping many times over the list and trying to look ahead from each list entry and match patterns but was totally lost due to the complexity ...

Thanks in advance for any hint !!!

SvenG
  • 5,155
  • 2
  • 27
  • 36

4 Answers4

2

It's a modified string matching problem. You have two type of input:

  1. The ones like "BCD". If you only have this, it can be matched using any conventional algorithm here

  2. The arbitrary number of the same object in a row.

There are 2 solutions off the top of my head:

  1. Use conventional string algorithm(KMP or whatever), but make an exception rule for the second type of input.

  2. Build a directed graph like:

enter image description here

Well the figure above is poorly drawn. Let me know if you have any questions.

Xi 张熹
  • 10,492
  • 18
  • 58
  • 86
2

I am not completely sure that this is what you need but with this small code I can create the output you specified

Simple usage (with assertions):

var a1 = new List<string> { "A", "A", "B", "C", "D", "E", "E" };

a1.ApplyCriteria("A").Criteria.Should().Be("A");
a1.ApplyCriteria("A").Count.Should().Be(2);

a1.ApplyCriteria("E").Criteria.Should().Be("E");
a1.ApplyCriteria("E").Count.Should().Be(2);

a1.ApplyCriteria("BCD").Criteria.Should().Be("BCD");
a1.ApplyCriteria("BCD").Count.Should().Be(1);

a1.ApplyCriteria("CD").Criteria.Should().Be("CD");
a1.ApplyCriteria("CD").Count.Should().Be(1);

// not found
a1.ApplyCriteria("CDA").Criteria.Should().Be("CDA");
a1.ApplyCriteria("CDA").Count.Should().Be(0);

My GroupResult class returned by the ApplyCriteria method looks like:

class GroupResult
{
    public string Criteria { get; set; }
    public int Count { get; set; }
}

And these are the extension methods that are doing the real work

static class Ext
{
    public static GroupResult ApplyCriteria(this IEnumerable<string> source, string criteria)
    {
        var elements = source.ToConcatenedString();

        return new GroupResult { Criteria = criteria, Count = elements.CountOcurrences(criteria) };
    }

    public static int CountOcurrences(this string source, string phrase)
    {
        return source
            .Select((c, i) => source.Substring(i))
            .Count(sub => sub.StartsWith(phrase));
    }

    public static string ToConcatenedString<TSource>(this IEnumerable<TSource> source)
    {
        var sb = new StringBuilder();

        foreach (var value in source)
        {
            sb.Append(value);
        }

        return sb.ToString();
    }
}
Jupaol
  • 21,107
  • 8
  • 68
  • 100
  • Thanks a ton ! I really like your answer because of the usage of extension methods and LINQ, which makes a rather complicated matter very easy to understand and comprehend. I copy/pasted your code made some modification/extensions and it works like a charm, thanks again !! – SvenG Jun 05 '12 at 15:51
1

Assuming you have some kind of a code to compare between the objects, and to tell whats A and whats B, You can define your template as an array, and then iterate through Your original list, searching for template's occurences.

CustomObj[] template = new CustomObj[]{B,C,D};
for (int i=0; i< originalList.Length- template.Length + 1; i++)
{
     bool found= true;
     for(int j=0; j< template.Length;j++)
     {
        found = template[j] == originalList[i +j];
     }
     if (found)
     {
        //add to results list
      }
}

Search comparison algorithms (the simplest among them, as far as I remember) use such concepts, and also some of the compression algorithms, but they work on it from the other side (building templates to reduce storage by creatinbg indexes of templates)

EDIT
Turns out I actually implemented the simple Rabin-Karp algorithm
I remembered it was something like that :)

YavgenyP
  • 2,113
  • 14
  • 11
1

In the bare basics, you can build a state machine. It will have 6 states, "Init", "alpha", "B", "C", "charlie" and "epsilon".

Start from init:

  • If the next object is "A", go to state alpha, increment alpha counter by 1.
  • If the next obj is B, go to state B.
  • If the next object is "E", go to state epsilon,increment Epsilon counter.
  • If any other object, stay in the init state.

In state aplha:

  • If next object is A, stay in state alpha.
  • If next object is B, go to state B
  • If next obj is E ,go to state epsilon and increment epsilon counter.
  • If anything else, go to init.

In state B:

  • If next is A, go to alpha and inc counter
  • If next is E, go to epsilon, inc its counter.
  • If next is C, go to state C
  • Anything else -> go to init

In state C:

  • If next is A, go to alpha and inc counter
  • If next is E, go to epsilon, inc its counter.
  • If next is D, go to state charlie, increase charlie counter
  • Anything else -> go to init

In state D:

  • If next is A, go to alpha and inc counter
  • If next is E, go to epsilon, inc its counter.
  • If next is B, go to state B
  • Anything else -> go to init

In state epsilon:

  • If the next object is "A", go to state alpha, increment alpha counter by 1.
  • If the next obj is B, go to state B.
  • If the next object is "E", do nothing.
  • If any other object, go to init state.

I know it looks complex, but it really isn't, at least at this point, especially if you create a state diagram. Of course it would quickly become very complex if you want something more general, or if you want to keep adding new patterns, or if you have more patterns. In that case, your best shot I believe would be to adapt one of the string matching algorithms to your problem.

Community
  • 1
  • 1
Hari Menon
  • 33,649
  • 14
  • 85
  • 108