2

I have a sorted List of ints (in descending order) that I need to re-order so that each batch (e.g. in 20s) has ints that are as far apart as possible from each other.

So, for example, if the list contains the following bookings:

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

and I was to process them in a batch of 4, would a good approach be to split the bookings into 4 lists (12 / 3) and moving down each list of 4 take 1 from each and append to the newly created re-ordered list? So, the new list would contain the bookings in the following order:

  • 1
  • 5
  • 9
  • 2
  • 6
  • 10
  • 3
  • 7
  • 11
  • 4
  • 8
  • 12

I appreciate it's not easy to understand what I'm after with my question so it might be easier to look at my code! I came up with the following approach and was wondering if it is the best/most performant way of achieving what I'm after:

var bookings = new List<int>();

// <snip>import bookings ints from csv file into bookings list</snip>

bookings = bookings.Distinct().OrderByDescending(x => x).ToList();

int chunkSize = 20;

// get number of chunks/batches
int chunkCount = (int)Math.Ceiling((double)bookings.Count/(double)chunkSize);

// split booking list into number of chunks
// add the ith booking from each chunk into new list
// ToChunks() extension method from http://stackoverflow.com/a/6852288/1578713
var chunkList = bookings.ToChunks(chunkCount).ToList();
var reorderedBookings = new List<int>();
for (int i=0; i<chunkCount; i++)
{
    // last chunk may be smaller than chunkSize hence the check
    reorderedBookings.AddRange(from chunk in chunkList where chunk.Count() > i 
        select chunk.ToList()[i]);
}

bookings = reorderedBookings;

The Background (if you really want to know)

My problem is that I'm processing* a List of ints (bookings that are imported from a csv file and ordered) in parallel batches (of 20, say) and the closer together (more consecutive) these bookings are in the list the more likely it is that an exception will be thrown (beyond my control).

It's a tradeoff between running thousands of bookings synchronously (so no exception will be thrown) or running 20-odd in parallel with a likelihood of the said exception being thrown every once in a while. I've opted for the latter approach because it's quicker and any bookings that result in exceptions can be run synchronously after the async tasks have completed.

*calling a couple of services, passing in each booking int in turn

Appulus
  • 18,630
  • 11
  • 38
  • 46
  • When looking at, "how far apart the values are from each other" is it the max - min value, the average distance between each of the elements, what? If you've already written the function, could you provide the code for how you determine the "score" for a particular batch? – Servy Nov 28 '12 at 17:30
  • @Servy, appreciate your comments. I was using the more literal meaning of shuffle admittedly. I'm not sure I understand what you mean by your second comment as I've provided all the relevant code? – Appulus Nov 28 '12 at 17:33
  • You have demonstrated how to batch the sequence (not super well, but get it working before making it pretty). You say you want "an order so that each batch has values that are as far apart as possible from each other." I'm saying I don't know what that means. How does one come up with that value for a given batch? Once you have that function we can address the more complex issues of maximizing that value. We also need to know how to aggregate that value for each batch. Do we want one batch with the greatest value, maximize the average value between batches, etc. – Servy Nov 28 '12 at 17:39
  • Are you running each of the items in a batch in parallel, are you setting each batch as the input of a queue, each processed in parallel, or what? – Servy Nov 28 '12 at 17:41
  • @Servy, I understand my original explanation was pretty poor so I hope I've done a better job in my edit! – Appulus Nov 28 '12 at 18:00
  • I still can't tell what "as far apart as possible" means for sets of >2 numbers. Does it mean that you want to maximize the distance between the furthest pair in a set? Maximize the distance between each pairing within a set? For example, I don't know which is a "better" set: {1, 2, 20} or {1, 10, 20}. The first has an average distance of 12.6, but the smallest distance is 1. The latter has an average distance of 12.3, but the smallest distance is 9. Both have a maximum distance of 19. – Bobson Nov 28 '12 at 18:08
  • @Bobson I think what he wants is to take the first item, put it in the first batch, put the second item in the second batch, third in the third, then (if there are 3 batches) put the 4th in the first batch, the 5th in the second, and so on. Basically, the distance between items in each batch should always be the number of batches. – Servy Nov 28 '12 at 18:12
  • So your second bullet list shows four batches with three items in each batch? At each stage you will be doing three tasks in parallel? – Matthew Strawbridge Nov 28 '12 at 18:17

2 Answers2

1

Okay, so it's fairly straightforward:

public static IEnumerable<IEnumerable<T>> BatchAndSeparate<T>(IList<T> source, int batchSize)
{
    int numBatches = (int)Math.Ceiling(source.Count / (double)batchSize);

    for (int i = 0; i < numBatches; i++)
    {
        var buffer = new List<T>();
        for (int j = i; j < source.Count; j += numBatches)
        {
            buffer.Add(source[j]);
        }
        yield return buffer;
    }
}

The general algorithm is that for each batch your index starts out with that batch's number (zero indexed) and add the number of batches each iteration.

It can be used like:

var batches = BatchAndSeparate(Enumerable.Range(1, 13).ToList(), 3);

foreach (var batch in batches)
{
    Console.WriteLine(string.Join(" ", batch));
}

This will print:

1 6 11
2 7 12
3 8 13
4 9
5 10

When displayed this way it's easy to see that it's correct. Just go down each column and see that the numbers increase by one, and then move from the left-most column to the right.

Servy
  • 202,030
  • 26
  • 332
  • 449
1

Assuming that Servy is right about what you want, you can do it as either

int i = 0;
var grouped = bookings.OrderBy(x => (i++) % chunkSize);

or

int i = 0;
var grouped = bookings.GroupBy(x => (i++) % chunkSize);

The first will give you a single list, the second will break it up into separate lists.

Bobson
  • 13,498
  • 5
  • 55
  • 80