0

This may be a bit difficult to describe, but I am building some statistical models and I need help with Linq syntax... My analogy is not exactly what I'm doing but it is the simplest description I can come up with...

I have an "ordered" list of inputs... Let's assume that I have an indeterminate list of Int's within a range... This list could contain anywhere from 1 to several million entities...

List<int> = [1, 3, 15, 16, 4, 27, 65, 2, 99, 3, 16, 21, 72, 1, 5, 7, 2, 8... ] (range 1 - 100).

What I am looking to extrapolate is "all" the ranges that contain a specific "search" set (or sub list) of entities. Each sub-list must contain "all" the entities from within the original input list, that is to maintain the inner erroneous data, and must not change the order... for instance, if I wanted to search for "all" ranges that contain [1, 2, 3, 4] from the list above I should get

  • [1, 3, 15, 16, 4, 27, 65, 2] - the first sub-list that contains the union of the search list.
  • [3, 15, 16, 4, 27, 65, 2, 99, 3, 16, 21, 72, 1] - next list...
  • [4, 27, 65, 2, 99, 3, 16, 21, 72, 1] - next list... etc...

The real critical piece of information is the "starting and ending indices" of each list... This data will need to be stored for use in a Neural Network as vector data... With this data the NN could simply use the index object to do the critical data calculations...

After some evaluation I realized that obviously each list will start and end with a search entity. This led me to start with this...

var indicies = lc.IntListData
                 .Select((v, i) => new { value = v, index = i })
                 .Where(n => search.Contains(n.value))
                 .ToList();

This reduced my list extensively, from looking at lists of millions of values to looking at thousands of anonymous types of value and index... Now, what I believe I need, is to find from "indicies", the first "n" anonymous types until I have at least one of each "value" in the list... No? Then simply use the min and max of the index values as necessary...

Any help with the Linq syntax to accomplish this would be most helpful.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
Stewart Basterash
  • 223
  • 1
  • 2
  • 7
  • 1
    I'm reading this for 10x time and still not any closer what sublists your are trying to find... – Konrad Kokosa Jul 28 '14 at 18:33
  • If I understand you correctly you are looking for all subranges of your list that contain every number in your search criteria. I have updated the title to reflect my understanding. If I understood wrong, feel free to revert it. If I am right, please update your list definition and sample code to compile standalone. Both of these will help you attract people who can provide an answer. – Frank van Puffelen Jul 28 '14 at 19:00
  • Your code snippet seems to identify potential points where a subrange starts. Is that indeed what it is meant to do? – Frank van Puffelen Jul 28 '14 at 19:01

1 Answers1

1

LINQ version (with helper extension method):

If you're willing to accept a helper function to create tuples from your list of potential matches, you can get the LINQ down to this:

var subranges = matches
    .Tuples(search.Length)
    .Where(t => t.Select(n => n.Value).Distinct().Count() == search.Length)
    .Select(t => new { StartIndex=t.Min(n => n.Index), EndIndex=t.Max(n => n.Index) })
    .Select(r => list.Skip(r.StartIndex).Take(r.EndIndex-r.StartIndex+1))
    ;

The Tuples method is a variation of the extension method from this answer: https://stackoverflow.com/a/577612/209103.

public static IEnumerable<IEnumerable<T>> Tuples<T>(this IEnumerable<T> sequence, int nTuple)
{
    if(nTuple <= 0) throw new ArgumentOutOfRangeException("nTuple");

    for(int i = 0; i <= sequence.Count() - nTuple; i++)
        for (int j = i+nTuple; j < sequence.Count(); j++)
            yield return sequence.Skip(i).Take(j-i);
}

Note that Tuples is O(nn) and the entire solution is O(nn*n), so will not perform well for larger data sets.

Non LINQ versions below

Rough first draft of a non-LINQ version:

var list = new [] { 1, 3, 15, 16, 4, 27, 65, 2, 99, 3, 16, 21, 72, 1, 5, 7, 2, 8 };
var search = new [] { 1, 2, 3, 4 };

var group = new List<int>();
var start = -1;
for (var i=0; i < list.Length; i++) {
    if (search.Contains(list[i])) {
        if (!group.Any()) {
            start = i;
        }
        if (!group.Contains(list[i])) {
            group.Add(list[i]);
        }
        if (group.Count == search.Length) {
            Console.WriteLine(start+" - "+i);
            group.Clear();
            i = start + 1;
            start = -1;
        }
    }
}

This uses brute force, but could be optimized with your method of finding matching indices first.

Update - optimized by only considering relevant indices

var list = new [] { 1, 3, 15, 16, 4, 27, 65, 2, 1, 99, 3, 16, 21, 72, 1, 5, 7, 4, 2, 8 };
var search = new [] { 1, 2, 3, 4 };

var matches = list
    .Select((v,i) => new { Value=v, Index=i })
    .Where(n => search.Contains(n.Value))
    .ToList()
    ;

for (var start=0; start < matches.Count(); start++) {
    var group = new List<int>();
    group.Add(matches[start].Value);
    foreach (var match in matches.Skip(start)) {
        if (!group.Contains(match.Value)) {
            group.Add(match.Value);
        }
        if (group.Count == search.Length) {
            Console.WriteLine(matches[start].Index+" - "+match.Index);
            break;
        }
    }
}

Both approaches are non-LINQ. I'm not sure how to go about turning this into a LINQ expression. It's clearly a grouping exercise (.GroupBy), but I can't see what expression to group on.

Community
  • 1
  • 1
Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • 1
    Frank... Your response was accurate and helped me greatly... thank you. The key was the grouping. the result is now far more effective than my original C# for loop. I started with the optimized list as I stated originally, and you pointed out in your optimized example... I then piped it into group, and finally ordered by index and selected first()... I did not know that group and then select first would return the first instance of every group item... this solved my problem... thanks! – Stewart Basterash Jul 28 '14 at 23:50