2

The customers object has 30,000 records and the Orders object has 20,000 records. Left join using for each is 4 seconds slower than using linq's group join. I have two questions:

  1. What is the reason for that?
  2. How can make it faster without using linq?

    foreach (Customer c in Customers) {
        foreach (Order o in Orders) {
            if (c.ID == o.OwnerID) {
                c.OrderName = o.OrderName;
                break; 
            }
        }
    }
    
user3587180
  • 1,317
  • 11
  • 23
  • 1
    try sorting customers on their ID and orders on their ownerid then use your algorithm again but start the inner loop from outer loop variable(ID - index) instead of zero – huseyin tugrul buyukisik Mar 25 '17 at 19:14
  • 2
    Why do you want to avoid LINQ? Just for the sake of reinventing the wheel? Assuming you're using LINQ to Objects, it's probably already as efficient as any implementation you're going to hand-roll. – Joe White Mar 25 '17 at 19:15
  • 1
    @JoeWhite I'm not avoiding linq in actual implementation. Just trying to get better at coding, so need to understand these details. – user3587180 Mar 25 '17 at 19:17
  • this looks brute-force and some library like an "arrayfire" would make it work on gpu much faster like 20 milliseconds, easily – huseyin tugrul buyukisik Mar 25 '17 at 19:17
  • @huseyintugrulbuyukisik Wouldn't sorting and doing a match be 3 loops? – user3587180 Mar 25 '17 at 19:17
  • C#'s own sorting(from Array interface?) should be efficient and decrease the complexity of nested loops – huseyin tugrul buyukisik Mar 25 '17 at 19:18
  • @huseyintugrulbuyukisik I understand sorting is more efficient. But what I don't understand is how is sorting both the objects and then doing nest loops faster than just a nested loop? I'm not arguing with you. Just trying to understand concepts.. – user3587180 Mar 25 '17 at 19:20
  • @user3587180 when you sort, it is certain you don't need the elements smaller than the last picked objects value and the next step could be just a single step instead of whole remaining elements with an "early quit" O(NLOGN) + O(NLOGN) instead of O(N*N) – huseyin tugrul buyukisik Mar 25 '17 at 19:22
  • @huseyintugrulbuyukisik gotcha thanks! if you post this as an answer i'll accept it. – user3587180 Mar 25 '17 at 19:24
  • @user3587180 Did you test it? Please test it, see it if it is faster, then tell us about the result – huseyin tugrul buyukisik Mar 25 '17 at 19:25
  • `GroupJoin` is using hash based lookup, thus has O(M + N) time complexity, which is better than suggested sorted implementations, and for sure is better than your O(M * N) nested loops. – Ivan Stoev Mar 25 '17 at 19:27

2 Answers2

1

Processing a sorted array is always faster. (This could be one of the most upvoted answer in stackoverflow). That question is about hardware but software gains from that too.

Sort both arrays.

Now start inner loop from outer loop's latest index(being equal to ownerid) equivalent of inner loop index, not from zero. You already have early quit so total complexity would be

O(small) + O(small) instead of O(bruteforce)
sorting    nested loop         nested loop unsorted

If you have time, you can install arrayfire(C++) and get some wrapper around it to use in C# for these brute-force calcs. Only this cheating would be better than linq's join for small(30k-100k) arrays.

Cheating dissolves when number of elements go millions and algorithm becomes utmmost importance unless you have 3-4 high-end gpus in case. Then it would stuck at around 30M then algorithm would shine again unless you have a cluster but if you have cluster, it would be a waste to not use advanced algorithm.


Best is C#'s own implementation ofcourse when CPU is used. As in the Ivan Stoev's comment, a good hash function is better than sorting.

Community
  • 1
  • 1
huseyin tugrul buyukisik
  • 11,469
  • 4
  • 45
  • 97
-1

I don't know why are you trying to avoid using linq, using two nested for each loops are not always a good practice, however, try to use for loops instead of for each it is much faster for big list of data.

Abdullah Dibas
  • 1,499
  • 1
  • 9
  • 13