3

I have such code

    senders.FirstOrDefault(sender => !sender.IsBusy);

This line is called pretty often.

The problem is that it always returns the first non-busy sender; the same first sender is returned pretty often, but the last one is returned very rarely. How to easily balance this?

Ideally, on every call I should return the most rarely used sender. I.e. between all non-busy senders select the one that was selected the least number of times during the last second.

stakx - no longer contributing
  • 83,039
  • 20
  • 168
  • 268
Oleg Vazhnev
  • 23,239
  • 54
  • 171
  • 305
  • Do you want pseudo random or least busy? Least busy implies that you keep a history of who is busy and who is not and refer to that history to decide the least busy sender over a period of time. – Bazzz Aug 05 '11 at 08:26
  • pseudo random would be a big improvement, however it would be preferreable to keep history, because It would be nice to report "the load" (sends per second), also it would be nice if I can "limit" the load – Oleg Vazhnev Aug 05 '11 at 08:30
  • 2
    You've got quite a few answers suggesting a `RandomOrDefault` or `Shuffle` extension method. If you use any of these solutions, remember to filter out as many senders as soon as you can: `senders.Shuffle().FirstOrDefault(s => !s.IsBusy)` (bad, since busy senders don't need to be shuffled) vs. `senders.Where(s => !s.IsBusy).RandomOrDefault()` (good). Ideally, you'd have an extension method that does both: `senders.RandomOrDefault(s => !s.IsBusy)` – stakx - no longer contributing Aug 05 '11 at 08:36
  • check my solution for non-random approach – Grozz Aug 05 '11 at 08:55
  • @stakx, I think you are not 100 % accurate in your comment. If your `RandomOrDefault` method takes a predicate there is not point calling `Where` before `RandomOrDefault`... oh, sorry missed the last part of the comment that is mention this. – Tomas Jansson Aug 05 '11 at 20:45

7 Answers7

3

Maybe something like:

public static T RandomOrDefault<T>(this IEnumerable<T> dataSet)
{ 
    return dataSet.RandomOrDefault(y => true);
}

public static T RandomOrDefault<T>(this IEnumerable<T> dataSet, Func<T, bool> filter)
{
    var elems = dataSet.Where(filter).ToList();
    var count = elems.Count;
    if (count == 0)
    {
        return default(T);
    }
    var random = new Random();
    var index = random.Next(count - 1);
    return elems[index];
}

then you can call it with:

senders.RandomOrDefault(sender => !sender.IsBusy);
Tomas Jansson
  • 22,767
  • 13
  • 83
  • 137
  • +1. Such an extension method would result in code that probably couldn't be more expressive and easier-to-understand. – stakx - no longer contributing Aug 05 '11 at 08:45
  • This algorithm iterates over sequence 3 times. It's not so bad, but this can be implemented with only 1 iteration. BTW, you have a syntax error at `dataSet.Length`, it should be `dataSet.Count()` I suppose. – Dmitrii Lobanov Aug 05 '11 at 12:23
  • @Dmitry Lobanov, 3 times? I think it is two, one when calling two list and one when I getting the element, and I also think it might be harder to do it in less than two since I'm taking in `IEnumerable`, but I might be wrong. What I mean is that you have to get the count and to get the count from an `IEnumerable` you have to iterate over it, or am I wrong? Please come with your suggestion. – Tomas Jansson Aug 05 '11 at 12:38
  • @Dmitry Lobanov, didn't see your answer first. Nice solution. But regarding the complexity so is my solution also O(n). I even think that I can change `elems.ElementAt` to `elems[index]` which would make that a constant operation => one iteration over the elements. The difference with your solution is that I copying everything to memory with the `ToList`. – Tomas Jansson Aug 05 '11 at 12:48
  • Yes, if you call `Count()` on `IEnumerable`, you iterate over it. The same is valid if you call `ElementAt()` on IEnumerable. But there must be some optimizations in LINQ when actual type is something like ICollection (with `Count` property) or IList (with `indexer` property). – Dmitrii Lobanov Aug 05 '11 at 20:00
  • Your current implementation should iterate only once (when calling `ToList()` method). And perhaps you would like to write `elems.Count` instead of `dataset.Count`. BTW, "my solution" is not "my solution" actually as it has been inspired by the algorithm from Tomas Petricek's book :). As I mentioned in my answer, "my solution" is better when dealing with large sequences or some streams when you would not like to load everything in memory. – Dmitrii Lobanov Aug 05 '11 at 20:10
  • @Dmitry Lobanov, and that was the case all the time, that was why I made converted it to a list since `Length` is a property on the list and not a method. I used the `ElementAt` first which wasn't optimal but when changing to the use the index instead it should work in constant time. – Tomas Jansson Aug 05 '11 at 20:43
  • I would suggest doing all the work in the overload which doesn't take a filter. That way the overload which does take a filter can apply the filter, and pipe the result into the other overload. This way you don't have to have the ineffectual but no less time consuming `y => true` predicate gumming things up. What do you think? – CSJ Mar 04 '14 at 18:22
  • @TomasJansson, looks like u got an error in ur code. `var count = dataSet.Count;` there must be `elems` instead of `dataSet` otherwise, if non of the elements are inside filter we will get `ArgumentOutOfRangeException` on last line. – Seekeer Jun 05 '14 at 15:30
2

If you want to get the least used one efficiently you will be probably good with the following non-Linq 'list rotation' solution, which is O(n) effiency and O(1) space unlike most of others:

// keep track of these
List<Sender> senders;
int nSelected = 0;          // number of already selected senders

// ...

// solution
int total = senders.Count;  // total number of senders
// looking for next non-busy sender
Sender s = null;
for (int i = 0; i < total; i++)
{
    int ind = (i + nSelected) % total; // getting the one 'after' previous
    if (!senders[ind].IsBusy)
    {
        s = senders[ind];
        ++nSelected;
        break;
    }
}

Of course this adds the must-be-indexable constraint on senders container.

Grozz
  • 8,317
  • 4
  • 38
  • 53
  • i like that one, but how can I "metric" the load? I.e. I want "how many times is sent in last second" online statistics..... hm.. but with such statistic I can just pick up the most rarely used sender... – Oleg Vazhnev Aug 05 '11 at 09:06
  • @Grozz, fixed a small mistake in your code: It should select *non-busy* senders. – stakx - no longer contributing Aug 05 '11 at 09:06
  • @stakx: thanks. @javapowered: do you need total amount or amount for each sender? the latter should be about the same, so probably total, for that you can add tracking of time right where you increment `nSelected`. – Grozz Aug 05 '11 at 09:26
  • @Grozz I think i need per-sender statistics to diagnose possible problems (network latency etc.) For example if one sender "lags" it should be "busy" more ofthen resulting in less data processing – Oleg Vazhnev Aug 08 '11 at 20:50
  • @Grozz why not replace `++nSelected` with `nSelected = ind + 1` ? – Oleg Vazhnev Aug 08 '11 at 21:10
1

You could easily reorder by a new Guid, like this:

senders.Where(sender => !sender.IsBusy).OrderBy(x => Guid.NewGuid()).FirstOrDefault();

You don't mess with random numbers, you don't have to identify a "range" for these numbers. It just plain works and it's pretty elegant, I think.

Matteo Mosca
  • 7,380
  • 4
  • 44
  • 80
  • I'm sure this works in practice, but I wouldn't recommend it for three reasons: **1.** GUIDs are probably more expensive to create than pseudo-random numbers. This by itself wouldn't be so bad, but: **2.** GUIDs aren't necessarily random. I've seen batches of very similar, "sequential" GUIDs, which would not be usable for shuffling a collection. **3.** Sorting by GUID obstructs the true meaning of this code and makes it harder to understand. – stakx - no longer contributing Aug 05 '11 at 08:42
  • I can argue that 1- true but the expense must be evaulated in the context (it can be from trivial to machine killing, only the actual developer will know that) 2 - They aren't random if you generate sequential guids. Using Guid.NewGuid() will give you random guids. Think I posted this without knowing the result? ;) 3 - How a single OrderBy(x => Guid.NewGuid()) is harder to understand than maybe 4-5 more lines of code, or calls to other methods? – Matteo Mosca Aug 05 '11 at 08:45
  • Concerning **3.**: `.OrderBy(x => Guid.NewGuid())` seems more difficult to understand to me because GUIDs are not a part of the problem domain (ie. selecting non-busy senders), yet suddenly show up in the query. Someone who will read that code ("sort by random GUIDs") might easily get distracted by thinking, what do *GUIDs* have to do with this? OTOH, something like `.RandomOrDefault(sender => !sender.IsBusy)` says almost *exactly* what the task is all about: "Give me (if you can) a random sender that is not busy." Which is closer to the problem? -- **1.** and **2.**: fair enough. :) – stakx - no longer contributing Aug 05 '11 at 08:55
  • (SO seems to gulps down every instance of @Matteo at beginnings of comments. I'm posting this comment so that you get notified.) – stakx - no longer contributing Aug 05 '11 at 08:58
  • @stakx In that case, you can simply make an extension method Shuffle() or whatever you want to call it that internally does the new guid thing. Readability and domain problem solved :) – Matteo Mosca Aug 05 '11 at 09:00
  • Agreed, @Matteo, that would be an option. – stakx - no longer contributing Aug 05 '11 at 09:00
  • Elegant, but very inefficient for this particular task. – Grozz Aug 05 '11 at 09:35
0

You could use the "Shuffle" extension method from this post before your FirstOrDefault

Community
  • 1
  • 1
alun
  • 3,393
  • 21
  • 26
0

Keep a look-up of senders that you have used and the time when they were used.

var recentlyUsed = new Dictionary<Sender, DateTime>();
var sender = senders.FirstOrDefault(sender => !sender.IsBusy && (!recentlyUsed.ContainsKey(sender) || recentlyUsed[sender] < DateTime.Now.AddSeconds(-1)));
if (sender != null)
    recentlyUsed[sender] = DateTime.Now;
Tim Rogers
  • 21,297
  • 6
  • 52
  • 68
0

You could use Skip with a random number less than the total number of non-busy senders.

senders.Where(sender => !sender.IsBusy).Skip(randomNumber).FirstOrDefault();

Identifying a sensible limit for the random number might be a bit tricky though.

Scott Munro
  • 13,369
  • 3
  • 74
  • 80
0

Based on algorithm from the "Real world functional programming" book here's the O(n) implementation of extension method for taking random or default value from IEnumearble.

public static class SampleExtensions
{
    // The Random class is instantiated as singleton 
    // because it would give bad random values 
    // if instantiated on every call to RandomOrDefault method
    private static readonly Random RandomGenerator = new Random(unchecked((int)DateTime.Now.Ticks));

    public static T RandomOrDefault<T>(this IEnumerable<T> source, Func<T, bool> predicate)
    {
        IEnumerable<T> filtered = source.Where(predicate);

        int count = 0;
        T selected = default(T);
        foreach (T current in filtered)
        {
            if (RandomGenerator.Next(0, ++count) == 0)
            {
                selected = current;
            }
        }

        return selected;
    }

    public static T RandomOrDefault<T>(this IEnumerable<T> source)
    {
        return RandomOrDefault(source, element => true);
    }
}

Here's the code to ensure that this algorithm really gives the Uniform distribution:

[Test]
public void TestRandom()
{
    IEnumerable<int> source = Enumerable.Range(1, 10);
    Dictionary<int, int> result = source.ToDictionary(element => element, element => 0);
    result[0] = 0;

    const int Limit = 1000000;
    for (int i = 0; i < Limit; i++)
    {
        result[source.RandomOrDefault()]++;
    }

    foreach (var pair in result)
    {
        Console.WriteLine("{0}: {1:F2}%", pair.Key, pair.Value * 100f / Limit);
    }

    Console.WriteLine(Enumerable.Empty<int>().RandomOrDefault());
}

The output of TestRandom method is:

1: 9,92%
2: 10,03%
3: 10,04%
4: 9,99%
5: 10,00%
6: 10,01%
7: 9,98%
8: 10,03%
9: 9,97%
10: 10,02%
0: 0,00%
0
Dmitrii Lobanov
  • 4,897
  • 1
  • 33
  • 50