2

I'm thinking about application which will find some repeated patterns on stored IEnumerable. What I found till now was finding given subsequences in sequence - what is rather easy.

What I think I need is way to:

  • find if there is any unknown subseqence which ocurres more than once (note how does it look like, and where is placed)
  • find unknown sub-subseqences of subsequence recursive until the length of found pattern is longer that 4 (ie)

For now I have two ideas to investigate more but I'm sure that there is ready algorithm which handle such problem.

Ideas

  • Make Dictionary<List<MyObject>, List <List<MyObject>> > where I will keep as key first occurence of MyObject and then List of following occurences with expanding the list with each iteration. And then do the recursive finding for any found patterns.
  • Implement Hoffman coding because I think it belongs somehow to this problem.
h__
  • 761
  • 3
  • 12
  • 41
  • Have a look at this: http://stackoverflow.com/questions/4578260/how-to-find-all-duplicate-from-a-liststring – bdimag Apr 01 '14 at 21:06
  • Nice, but I think it will work only for single object imo (like single string in mentioned link). This is like looking for aa, bb, cc, aa, bb, dd, cc - and it returns {aa}, {bb}, {cc}. And in my case I want the algorithm to return {aa, bb} - subsequence of list of objects. So, the question is how to expand finding first repeated element to finding first with all following elements equal to another subsequence in given sequence. – h__ Apr 01 '14 at 21:21

1 Answers1

0

I'm not completely clear on what you're asking -- can you give an example of the object you're working with and the expected result?

Here's an example based off one of the responses...

consider

class Person
{
    public string Name;
}

you can do

var people = new List<Person>() { new Person() { Name="Joe" },
                                  new Person() { Name="Jim" },
                                  new Person() { Name="Jack" },
                                  new Person() { Name="Joe" },
                                  new Person() { Name="Jim" } };

var duplicateItems = from x in people
                     group x by x.Name into grouped
                     where grouped.Count() > 1
                     select grouped.Key;

duplicateItems, in this case, is a collection of strings (Name): {"Joe","Jim"}

if you're then looking for the actual Person objects, then you could do something like this (and I imagine there's a better way to combine this into the above):

var duplicatePeople = from x in people
                      where duplicateItems.Contains<string>(x.Name)
                      select x;

which would return a collection of Person objects (two Joe's and two Jim's)

bdimag
  • 953
  • 8
  • 11
  • 1
    OK. But as you see - in your list you have twice Joe and Jim one by one - first on 0 and 1 index, second on 3 and 4 index. My question is not about finding simple duplication but on finding duplication of sequence so in this case the algorithm should return - detected subsequence detectedSubsequence List which contains subsequence of persons Joe and Jim. Joe and Jim in such order exists twice in given list - on the begining and on the end. – h__ Apr 02 '14 at 05:59