7

So I was taking an online test where i had to implement a piece of code to simply check if the value was in the array. I wrote the following code:

    using System;
    using System.IO;
    using System.Linq;

    public class Check
    {
        public static bool ExistsInArray(int[] ints, int val)
        {
            if (ints.Contains(val)) return true;
            else return false;
        }
    }

Now i didn't see any problems here because the code works fine but somehow i still failed the test because this is "not fast enough" once the array contains a million values.

The only code i wrote myself is:

    if (ints.Contains(val)) return true;
    else return false;

The other code i was given to work with.

Is there a way to speed up this process?

Thanks in advance.

EDIT: I came across a page where someone apparently took the same test as i took and it seems to come down to saving CPU cycles.

Reference: How to save CPU cycles when searching for a value in a sorted list?

Now his solution within the method is:

    var lower = 0;
    var upper = ints.Length - 1;

    if ( k < ints[lower] || k > ints[upper] ) return false;
    if ( k == ints[lower] ) return true;
    if ( k == ints[upper] ) return true;

    do
    {
        var middle = lower + ( upper - lower ) / 2;

        if ( ints[middle] == k ) return true;
        if ( lower == upper ) return false;

        if ( k < ints[middle] )
            upper = Math.Max( lower, middle - 1 );
        else
            lower = Math.Min( upper, middle + 1 );
    } while ( true );

Now i see how this code works but it's not clear to me why this is supposed to be faster. Would be nice if someone could elaborate.

DLP
  • 209
  • 4
  • 11
  • 2
    Use `HashSet` instead of `int[]`. Also the method itself should just be scrubbed because why not write `return ints.Contains.val);` and if you are doing that why not just write that in your calling code instead of this method? – Igor Jul 11 '19 at 14:22
  • @Igor I doubt constructing a HashSet from an array and then querying it will be faster than `Contains`. – V0ldek Jul 11 '19 at 14:23
  • is the array sorted? – Chetan Jul 11 '19 at 14:23
  • 1
    You should add the actual requirements and conditions of the test. – skolldev Jul 11 '19 at 14:24
  • That could be an option if i was fully free to write the code indeed but i've been given an array and need to stay working with it so i can't just chance int[] to HashSet. (The array is sorted) – DLP Jul 11 '19 at 14:24
  • 2
    @V0ldek - who said anything about creating a HashSet in that method and then querying it? If you want to query a collection of millions of values you should not be using an array to begin with. – Igor Jul 11 '19 at 14:24
  • @Igor I think it's pretty clear from the question that he's receiving an `int[]` as the input. – V0ldek Jul 11 '19 at 14:25
  • If the array is sorted then Binary Search what you do to search an item from a sorted array. – Chetan Jul 11 '19 at 14:25
  • This is an interesting question for me. I would like to know does c# implicitly uses any search algorithm to search for the value or do we have to use it manually ? – Thameem Jul 11 '19 at 14:25
  • `HashSet`is only helpful if you have to search 500 different values in an array of 1 million items. – k3b Jul 11 '19 at 14:28
  • 1
    @Thameem Yeah, the so called "for loop algorithm". ([source](https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Contains.cs)) – V0ldek Jul 11 '19 at 14:28
  • Your edit is literally code for a binary search algorithm, though it's overly complicated as you can call `Array.BinarySearch`. Look at Mihir Dave's answer. IT WILL ONLY WORK FOR SORTED ARRAYS THOUGH. And you never said if the array was sorted. – V0ldek Jul 11 '19 at 14:58
  • @V0ldek in one of my previous comments i did elaborate the array was sorted but that aside if it's the same code as binary search algorithm it's a mystery to me why the short "return Array.BinarySearch(ints, val) > 0;" doesn't get taken as a correct answer while it does actually work but that's no fault of me or the code i guess the tests are just wrong in that way. – DLP Jul 11 '19 at 15:01
  • @DLP the tests are probably right. The short `return Array.BinarySearch(ints, val)` should have been checking greater than or equal to 0. @MihirDave updated their answer to reflect this. Try it with that fix and I'd bet your tests pass. – Joshua Robinson Jul 11 '19 at 15:07

4 Answers4

11

If it's sorted array you can use BinarySearch To speed Up the process

public static bool ExistsInArray(int[] ints, int val)
{
    return Array.BinarySearch(ints, val) >= 0;
}
Mihir Dave
  • 3,954
  • 1
  • 12
  • 28
  • So i tried this but the results are blowing my mind. When i test the code it works perfectly like it should but when i look at the evaluation it somehow says the method doens't even work on small methods while the results clearly prove it does work. – DLP Jul 11 '19 at 14:38
  • `the method doens't even work on small methods`: did you mean "it has to much performance overhead if you have small arrays" ? – k3b Jul 11 '19 at 14:42
  • Maybe the array isn't sorted then? – V0ldek Jul 11 '19 at 14:43
  • The array is sorted but there is no further explanation to why it says it doesn't work. It only says "Works on smaller arrays --> failed" – DLP Jul 11 '19 at 14:56
  • It should almost certainly be `return Array.BinarySearch(ints, val) >= 0`. `val` might be the first item in the array, after all. – Joshua Robinson Jul 11 '19 at 15:02
  • @DLP here you can add a check on Length of an array if it's small you can use your code, but the question is what number is small? – Mihir Dave Jul 11 '19 at 15:03
  • So somehow the test now do see this as a correct answer. I myself have no idea why it didn't take this the first 10 times i tried but now it does. Thanks for all the help this really helped me out. – DLP Jul 11 '19 at 15:11
  • @DLP it didn't take the first 10 times because there was a typo in the solution. The original answer would have failed if `val` were the first item in `ints`. The correction that @MihirDave made fixes that, and that is why it is working now :). – Joshua Robinson Jul 11 '19 at 15:18
0

You can use Parallel, something like the code below:

namespace ParallelDemo
{
    class Program
    {
        static void Main()
        {
            var options = new ParallelOptions()
            {
                MaxDegreeOfParallelism = 2
            };
            List<int> integerList = Enumerable.Range(0,10).ToList();
            Parallel.ForEach(integerList, options, i =>
            {
                Console.WriteLine(@"value of i = {0}, thread = {1}",
                    i, Thread.CurrentThread.ManagedThreadId);
            });

            Console.WriteLine("Press any key to exist");
            Console.ReadLine();
        }
    }
}

Note: It will speed up but you're going to use more memory

Michael
  • 21
  • 3
  • 2
    How would this help finding an item from the collection? – Chetan Jul 11 '19 at 14:26
  • I assume that the idea meant here is use several processors in parallel to solve the problem as decribed in [Map - Reduce](https://en.wikipedia.org/wiki/MapReduce). This code presented here demostrates parallelism but does not solve the initial question to check if a nuimber is contained in a list. – k3b Jul 11 '19 at 14:36
0

If the input array is already sorted, then using BinarySearch is the best approach.

.NET has inbuilt support of BinarySearch by using Array.BinarySearch method.

Just did a quick experiment on Contains and BinarySearch with a sorted array of 1 Million integer values as following.

public static void Main()
{
    var collection = Enumerable.Range(0, 1000000).ToArray();

    var st = new Stopwatch();

    var val = 999999;

    st.Start();

    var isExist = collection.Contains(val);

    st.Stop();

    Console.WriteLine("Time taken for Contains : {0}", st.Elapsed.TotalMilliseconds);

    t.Restart();

    var p = BinarySearchArray(collection, 0, collection.Length - 1, val);

    st.Stop();
    if(p == -1)
    {
        Console.WriteLine("Not Found");
    }
    else
    {
        Console.WriteLine("Item found at position : {0}", p);
    }

    Console.WriteLine("Time taken for binary search {0}", st.Elapsed.TotalMilliseconds);
}

private static int BinarySearchArray(int[] inputArray, int lower, int upper, int val)
{
    if(lower > upper)
        return -1;

    var midpoint = (upper + lower) / 2;

    if(inputArray[midpoint] == val)
    {
        return midpoint;
    }
    else if(inputArray[midpoint] > val)
    {
        upper  = midpoint - 1;              
    }
    else if(inputArray[midpoint] < val)
    {
        lower =  midpoint+1;    
    }

    return BinarySearchArray(inputArray, lower, upper, val);
}

Following is the output.

Time taken for Contains : 1.0518
Item found at position : 999999
Time taken for binary search 0.1522

It is clear that the BinarySearch has the upper hand here.

.NET's Contains method does not use BinarySearch internally. Contains is good for small collections but for bigger arrays BinarySearch is better approach.

Chetan
  • 6,711
  • 3
  • 22
  • 32
0

The correct answer is: it depends.

  • Is the list sorted?
  • How big is the list?
  • How many cores can you throw at the problem?

The simplest answer is that Linq, for all its wonder is actually quite slow. It uses a lot of reflection and generally performs a lot of work under the covers. It's great when ease of readability is your main goal. But for performance? No.

In a single threaded, unsorted list, the old fashioned for loop will give you the best results. If it is sorted then a binary search or some version of the a quick search will work best.

As for parallel, C# has the the parallel class. But beware, if the list is small enough, the overhead of creating threads can overcome your search time.

Simple, single threaded, unsorted answer:

    public static bool ExistsInArray(int[] ints, int val)
    {
        for( int index = 0, count = ints.GetLowerBound(0); index < count; ++index)
        {
            if (ints[index] == val) return true;
        }
        return false;
    }

it's possible that the site you're looking wants this instead. But this only works if the array is sorted.

    public static bool ExistsInArray(int[] ints, int val)
    {
        return Array.BinarySearch(ints, val) > 0;
    }

Supporting posts that demonstrate that Linq is not so fast.

Display name
  • 1,228
  • 1
  • 18
  • 29
  • Again, do you have data to prove that LINQ is doing something slower? I'm pretty sure there is absolutely no reflection involved in a simple `Contains` call, and certainly not "a lot". [See the source of Enumerable.Contains](https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Contains.cs). It's not a heavy method. – V0ldek Jul 11 '19 at 14:50
  • It's quite the opposite, LINQ is usually greatly optimized and [this simple benchmark](http://dotnetfiddle.net/IfQuzY) shows that a claim that it performs worse is simply not true. You should probably run much more cases to see how exactly it fares in performance, but the chances that a handwritten for loop will be faster than a carefully written and tested LINQ method are slim. – V0ldek Jul 11 '19 at 14:52
  • I'll see what I can find on the net. However, I can say that our in house experience shows that Linq is obviously slower and I have fixed more than one performance issue by swapping out Linq for more traditional methods. it's only an issue when the linq statement is itself part of a larger loop. – Display name Jul 11 '19 at 14:58
  • Yes, see this stack overflow link: https://stackoverflow.com/questions/14893924/for-vs-linq-performance-vs-future. The author of the answer repeats the same thing I just said. – Display name Jul 11 '19 at 15:00
  • And here: https://www.anujvarma.com/linq-versus-loopingperformance/ this author says Linq takes TIWCE as much time. – Display name Jul 11 '19 at 15:04
  • But how does a complicated query with lambdas even compare to a `Contains`? You missed the key point of that linked question: "if you measure a LINQ query to be a problem just ditch it". You won't measure it to be a problem here, especially since the problem here is that the author's supposed to use a binsearch. – V0ldek Jul 11 '19 at 15:07
  • Everything I can find regarding Linq shows that at best one may see some modest performance improvements but that's mostly when one uses a foreach or nested loops with out proper exit conditions to quit searching when the desired elements are discovered. THE VAST MAJORITY of posts clearly show that Linq has a lot of overhead. Despite the claims from the .NET team that made it. it is generally not so performant. IT IS USEFUL to help make code easier to read and maintenance, which is very important. But for performance: NO. – Display name Jul 11 '19 at 15:15
  • I'm... confused? Did you not read the entire article from https://www.anujvarma.com/linq-versus-loopingperformance/ ? The author says LINQ takes twice as much time for an array with 3000 items, but LINQ was "clearly faster - by a factor of 5." when he increased the size of the array to 40,000 items. – Joshua Robinson Jul 11 '19 at 15:15
  • Again, It comes down to: it depends. Just like SQL tuning, you have to know what you're using the tools against. – Display name Jul 11 '19 at 15:16