Faster alternative to nested loops?

Question

I have a need to create a list of combinations of numbers. The numbers are quite small so I can use byte rather than int. However it requires many nested loops in order to get every possible combination. I'm wondering if there's a more efficient manner to do what I'm after. Code so far is:

var data = new List<byte[]>();
for (byte a = 0; a < 2; a++)
for (byte b = 0; b < 3; b++)
for (byte c = 0; c < 4; c++)
for (byte d = 0; d < 3; d++)
for (byte e = 0; e < 4; e++)
for (byte f = 0; f < 3; f++)
for (byte g = 0; g < 3; g++)
for (byte h = 0; h < 4; h++)
for (byte i = 0; i < 2; i++)
for (byte j = 0; j < 4; j++)
for (byte k = 0; k < 4; k++)
for (byte l = 0; l < 3; l++)
for (byte m = 0; m < 4; m++)
{
    data.Add(new [] {a, b, c, d, e, f, g, h, i, j, k, l, m});
}

I was considering using something like a BitArray but I'm not sure how I could incorporate it.

Any recommendations would be greatly appreciated. Alternatively, perhaps this is the fastest way of doing what I want?

EDIT Couple of quick points (and apologies I didn't put these in the original post):

The numbers and the order of them (2, 3, 4, 3, 4, 3, 3 etc) are very important, so using a solution such as Generating Permutations using LINQ won't help because the maximums in each 'column' are different
I'm not a mathematician, so I apologise if I'm not using the technical terms like 'permutations' and 'combinations' correctly :)
I do need to populate all of these combinations at once - I can't just grab one or another based on an index
Using byte is faster than using int, I guarantee it. It's also a lot better on memory usage to have 67m+ arrays of bytes rather than ints
My ultimate goal here is to look for a faster alternative to nested loops.
I considered using parallel programming, but due to the iterative nature of what I'm trying to achieve, I couldn't find a way to do it successfully (even with ConcurrentBag) - however I'm happy to be proved wrong :)

CONCLUSION

Caramiriel has provided a good micro-optimisation which shaves some time off the loops, so I've marked that answer as correct. Eric also mentioned that it is faster to pre-allocate the List. But, at this stage it seems that the nested loops are in fact the fastest possible way of doing this (depressing, I know!).

If you want to try exactly what I was trying to benchmark with StopWatch, go with 13 loops counting up to 4 in each loop - that makes about 67m+ lines in the list. On my machine (i5-3320M 2.6GHz) it takes about 2.2s to do the optimised version.

Try to use linq and If you are using multicore processor then Parrallel.for — Jalpesh Vadgama, Apr 22 '15 at 06:06
based on what I see these are not the permutations but the combinations of a couple of very small (2-4 elements) sets is that right or do you indeed want all/some permutations of *one* set? — Random Dev, Apr 22 '15 at 06:11
I assume youve searched http://www.bing.com/search?q=c%23+permutation+enumerable already and for some reason (not mentioned in the post) decided against existing answers like http://stackoverflow.com/questions/4319049/generating-permutations-using-linq... Consider listing options you looked at and decided against to make this question better. — Alexei Levenkov, Apr 22 '15 at 06:12
@decoherence actually, no, it doesn't. Trust me, I've done a lot of benchmarking with `StopWatch` — benpage, Apr 22 '15 at 06:13
so this is really about performance? These are less than 67108864 elements — Random Dev, Apr 22 '15 at 06:16
If this is about performance: You can preallocate the list (constructor) and unroll some loops, but I think thats about it... apart from precalculating and storing these numbers. The loops (overhead) are probably the most costly of them all, since there are a low amount of operations inside the body. — Caramiriel, Apr 22 '15 at 06:24
Do the numbers actually relate to anything? Also, I'm wondering if you can some how skip the first 2 iterations in everything by just allocating by shifting bits from 2^13 but I have no idea if that would be faster — Sayse, Apr 22 '15 at 06:28
based on your comments maybe you really should precompute those, put them into a file (as you said it will not bis really big) and just load all your numbers from there - aside from this your basic problem is enumerating all numbers of the form 2^a+3^b+4^c+3^d+... and if you learned how to do binary arithmetic it should be obvious of how you could use this to generalize -but I don't know if this would be quicker and it surely will take some time to try (so if you are willing to experiment: **do** it :D) — Random Dev, Apr 22 '15 at 06:30
@CarstenKönig I would do that, however the numbers (2, 3, 4 etc) are dynamic. I would actually store them in an array (but I left that out for simplicity of the code) — benpage, Apr 22 '15 at 06:31
@Caramiriel you're right - and in my actual code I have preallocated that List. I was however hoping there was a funkier way of doing what I'm after :) — benpage, Apr 22 '15 at 06:32
Funny thing is that it only takes half a second on my machine (laptop); is it really something to optimize? — Caramiriel, Apr 22 '15 at 06:39
@Caramiriel try it with larger numbers, and yes, it definitely is! — benpage, Apr 22 '15 at 06:45
There likely isn't anything *faster* than the nested loops, but there are ways to shorten the code. — user253751, Apr 22 '15 at 06:56
@benpage: Why do you need to generate all combinations up-front? Why not generate a combination from its index when you need it? — Pieter Witvoet, Apr 22 '15 at 06:58
@PieterWitvoet because i'm actually using all of those combinations to calculate something, and then comparing the calculations. I can't just refer to one row. — benpage, Apr 22 '15 at 07:00
The only thing that can be optimized here is just accumulation of results. And given the amount of results, there's almost nothing you can do here. The rest depends on how this code will be used. For instance, if you don't need the entire list all the time, you could go with "lazy evaluation", or even making "Nth item" function to not store them at all. — D-side, Apr 22 '15 at 07:48
Sit back a second and look at the memory bandwidth used. Create a function that populates a contiguous block of memory of size `13 * 4^13`, and populates it with some repeating sequence using the lowest level language you can pull off (that is 0.9 gb of data). Examine the runtime in question. If you are going to beat raw memory bandwidth, you cannot actually write to memory! This can be done, but it isn't very conventional. — Yakk - Adam Nevraumont, Apr 22 '15 at 15:54
byte is better on a large array, but for temporary variables like counting ones, int is better — phuclv, Apr 22 '15 at 17:48
In the future @benpage, please post working code on [Code Review](http://codereview.stackexchange.com) Instead of Stack Overflow. — Quill, Apr 22 '15 at 23:36
@benpage I've updated my answer to a parallel version that's actually faster, unlike just wrapping it in a `Parallel.For()` which caused a false sharing problem. — jdphenix, Apr 23 '15 at 06:13

Caramiriel · Accepted Answer · 2020-12-15T12:00:27.350

62

As a reminder: you probably do not need this kind of code while developing your own solution. This can and should only used in very specific situations. Readability is often more important than speed.

You can use the properties of a struct and allocate the structure in advance. I cut off some levels in the sample below, but I'm sure you'll be able to figure out the specifics. Runs about 5-6 times faster than the original (release mode).

The block:

struct ByteBlock
{
    public byte A;
    public byte B;
    public byte C;
    public byte D;
    public byte E;
}

The loop:

var data = new ByteBlock[2*3*4*3*4];
var counter = 0;

var bytes = new ByteBlock();

for (byte a = 0; a < 2; a++)
{
    bytes.A = a;
    for (byte b = 0; b < 3; b++)
    {
        bytes.B = b;
        for (byte c = 0; c < 4; c++)
        {
            bytes.C = c;
            for (byte d = 0; d < 3; d++)
            {
                bytes.D = d;
                for (byte e = 0; e < 4; e++)
                {
                    bytes.E = e;
                    data[counter++] = bytes;
                }
            }
        }
    }
}

It's faster because it doesn't allocate a new list every time you add it to the list. Also since it's creating this list, it needs a reference to every other value (a,b,c,d,e). You can assume each value is only modified once inside the loop, so we can optimize it to do so (data locality).

Also read the comments for side-effects.

Edited the answer to use an T[] instead of a List<T>.

edited Dec 15 '20 at 12:00

answered Apr 22 '15 at 06:56

Caramiriel

7,029
3
30
50

I haven't checked out this code yet, but check the values you get in the list at the end. I think you'll find that when you add `bytes` to the list, and then add it again, it's referencing the same variable - hence all your values in the list will be the same.. (happy to be proven wrong though!) – benpage Apr 22 '15 at 06:59
1

Its a struct, so you should be okay =) they're all unique. It's copied when calling the `List.Add` method. – Caramiriel Apr 22 '15 at 07:00
you're right! I found this behaviour before when using arrays, but not structs. – benpage Apr 22 '15 at 07:19
4

It even faster if you have allocate the capacity to the List() – Eric Apr 22 '15 at 07:30
5

Watch out for **stackoverflow** exceptions when allocating too many objects on the stack. – Andrei Tătar Apr 22 '15 at 08:00
Why do you even need extra loop variables ? couldn't you just use bytes.X as loop variable ? `for ( bytes.A = 0; bytes.A < 4; ++bytes.A )` ? – Falco Apr 22 '15 at 15:21
1

Good suggestion - I've tried that, and its slightly slower, but not by much. Probably because of the indirect reads/writes. – Caramiriel Apr 22 '15 at 15:52
Replacing the big list by an array might speed it up a bit more. – CodesInChaos Apr 22 '15 at 20:21
7

@Andrew I don't get your point. This code isn't recursive and has minimal stack usage. – CodesInChaos Apr 22 '15 at 20:22
@CodesInChaos yes, but ByteBlock is a struct and it's allocated on the stack in this case. For 13 levels with a limit number of 4 per level it crashes on my machine in a 32 bit process. In a 64 bit process, it works fine. Here is the code I used: http://pastebin.com/43UWsPgy – Andrei Tătar Apr 23 '15 at 05:50
3

@Andrew: Thats out of memory, not stackoverflow. This is because of the `List.Add()` method goes beyond the point of what it can store. This will make it resize (doubles in size), which goes over 2GB of memory. Try preallocating by using new List(maxPerLevel.Aggregate(1, (x, y) => x*y)), though its already 'random' that you need a full 2GB block of this data in-memory. Also note that data.ToArray(); is expensive because it keeps items in memory twice at that point. [rephrased] – Caramiriel Apr 23 '15 at 07:34
@Andrew most of these structs instances are in an array, which I expect would mean heap allocation. – Nathan Cooper Apr 23 '15 at 08:38
1

Is there a reason you put spaces before your lines of code? – A.L Apr 23 '15 at 12:22
Not really, just copied it from the Program.cs, but I'll remove them for the correct presentation. – Caramiriel Apr 23 '15 at 14:12
While this is good, I was actually able to speed this up even more by adding a method to ByteBuffer that sets all the fields. Then you call the method instead of keeping a local variable. I believe this avoids initializing data in one place (a local on the stack) and copying it to another place (an array slot on the heap), but I haven't verified by looking at the disassembly. This is a huge gain (~2x as fast). – Mike Zboray Apr 24 '15 at 02:45

score 34 · Answer 2 · edited May 04 '15 at 02:20

34

What you are doing is counting (with variable radix, but still counting).

Since you are using C#, I assume you don't want to play with useful memory layout and data structures that let you really optimize your code.

So here I'm posting something different, which may not suit your case, but it's worth noting: In case you actually access the list in a sparse fashion, here a class that let you compute the i-th element in linear time (rather than exponential as the other answers)

class Counter
{
    public int[] Radices;

    public int[] this[int n]
    {
        get 
        { 
            int[] v = new int[Radices.Length];
            int i = Radices.Length - 1;

            while (n != 0 && i >= 0)
            {
                //Hope C# has an IL-opcode for div-and-reminder like x86 do
                v[i] = n % Radices[i];
                n /= Radices[i--];
            }
            return v;
        }
    }
}

You can use this class this way

Counter c = new Counter();
c.Radices = new int[] { 2,3,4,3,4,3,3,4,2,4,4,3,4};

now c[i] is the same as your list, name it l, l[i].

As you can see, you can easily avoid all those loops :) even when you pre compute all the list at whole since you can simply implement a Carry-Ripple counter.

Counters are a very studied subject, I strongly advice to search for some literature if you feel.

edited May 04 '15 at 02:20

Adam

15,537
2
42
63

answered Apr 22 '15 at 10:51

You can use `Math.DivRem` but I think the compiler will already optimize it. – Guillaume Apr 22 '15 at 14:22
4

I like your answer, but saying that all other answers are exponential is untrue. – Biscuits Apr 22 '15 at 20:42
1

What is the speed on this compared to Caramiriel's answer? – John Odom Apr 22 '15 at 22:49
Although random access is faster (no precalculation required) and the memory usage is way lower, its about 20x slower if you would calculate all values. I think its up for the requirements to decide which is better, but I have to admit I like this solution better. – Caramiriel Apr 23 '15 at 08:26
17

"C-kiddy-#", really? That seems entirely uncalled for. – KChaloux Apr 23 '15 at 12:56
2

And it does: [Math.DivRem](https://msdn.microsoft.com/en-us/library/yda5c8dx.aspx) – Caramiriel Apr 23 '15 at 16:05
1

I think that beyound some level, Optimization is a matter of use. For example, if every array is to be used only once, you can avoid the intensive memory allocation, which is the critical bottleneck in my opinion. Further, if you want to calculate all the value, you should exploit the fact that you do single increments (ie +1 increment) avoiding a division. This is more intended as a "out of the box" answear or a prototype, I didn't really try to speed it up, I just like it this way :) – Apr 23 '15 at 17:49
@biscuits, I honestly haven't looked at every answer, so apologies. You are sure you didn't fall in the [psudo-polynomial](http://en.wikipedia.org/wiki/Pseudo-polynomial_time) trap right? – Apr 23 '15 at 17:52
1

We're effectively doing the same thing, mine's just faster (no while-loop, etc). – Biscuits Apr 23 '15 at 21:33

Andrei Tătar · Answer 3 · 2018-10-15T12:07:31.573

14

On my machine, this generates the combinations in 222 ms vs 760 ms (the 13 for loops):

private static byte[,] GenerateCombinations(byte[] maxNumberPerLevel)
{
    var levels = maxNumberPerLevel.Length;

    var periodsPerLevel = new int[levels];
    var totalItems = 1;
    for (var i = 0; i < levels; i++)
    {
        periodsPerLevel[i] = totalItems;
        totalItems *= maxNumberPerLevel[i];
    }

    var results = new byte[totalItems, levels];

    Parallel.For(0, levels, level =>
    {
        var periodPerLevel = periodsPerLevel[level];
        var maxPerLevel = maxNumberPerLevel[level];
        for (var i = 0; i < totalItems; i++)
            results[i, level] = (byte)(i / periodPerLevel % maxPerLevel);
    });

    return results;
}

edited Oct 15 '18 at 12:07

answered Apr 22 '15 at 06:22

Andrei Tătar

7,872
19
37

This is a great answer! Unfortunately it runs slower than the nested loops. Any chance you could edit using TPL? – benpage Apr 22 '15 at 06:28
still quite a bit slower, unfortunately. – benpage Apr 22 '15 at 07:23
1

@benpage There is an easy way to make it at least 2 times faster. You just have to change the results type to int[,]. This will allocate the entire array memory in one call. I'm not sure how that fits your needs (changing the return type). – Andrei Tătar Apr 22 '15 at 07:27

Biscuits · Answer 4 · 2015-04-23T21:30:24.100

Method 1

One way to make it faster is to specify the capacity if you plan to keep using List<byte[]>, like this.

var data = new List<byte[]>(2 * 3 * 4 * 3 * 4 * 3 * 3 * 4 * 2 * 4 * 4 * 3 * 4);

Method 2

Furthermore, you could use System.Array directly to gain faster access. I recommend this approach if your question insists that every element be physically populated in memory, upfront.

var data = new byte[2 * 3 * 4 * 3 * 4 * 3 * 3 * 4 * 2 * 4 * 4 * 3 * 4][];
int counter = 0;

for (byte a = 0; a < 2; a++)
    for (byte b = 0; b < 3; b++)
        for (byte c = 0; c < 4; c++)
            for (byte d = 0; d < 3; d++)
                for (byte e = 0; e < 4; e++)
                    for (byte f = 0; f < 3; f++)
                        for (byte g = 0; g < 3; g++)
                            for (byte h = 0; h < 4; h++)
                                for (byte i = 0; i < 2; i++)
                                    for (byte j = 0; j < 4; j++)
                                        for (byte k = 0; k < 4; k++)
                                            for (byte l = 0; l < 3; l++)
                                                for (byte m = 0; m < 4; m++)
                                                    data[counter++] = new[] { a, b, c, d, e, f, g, h, i, j, k, l, m };

This takes 596ms to complete on my computer, which is about 10.4% faster than the code in question (which takes 658ms).

Method 3

Alternatively, you can use the following technique for low cost initialization that suits access in a sparse fashion. This is especially favorable when only some elements may be needed and determining them all upfront is considered unnecessary. Moreover, techniques like these may become the only viable option when working with more vast elements when memory runs short.

In this implementation every element is left to be determined lazily, on-the-fly, upon access. Naturally, this comes at a cost of additional CPU that is incurred during access.

class HypotheticalBytes
{
    private readonly int _c1, _c2, _c3, _c4, _c5, _c6, _c7, _c8, _c9, _c10, _c11, _c12;
    private readonly int _t0, _t1, _t2, _t3, _t4, _t5, _t6, _t7, _t8, _t9, _t10, _t11;

    public int Count
    {
        get { return _t0; }
    }

    public HypotheticalBytes(
        int c0, int c1, int c2, int c3, int c4, int c5, int c6, int c7, int c8, int c9, int c10, int c11, int c12)
    {
        _c1 = c1;
        _c2 = c2;
        _c3 = c3;
        _c4 = c4;
        _c5 = c5;
        _c6 = c6;
        _c7 = c7;
        _c8 = c8;
        _c9 = c9;
        _c10 = c10;
        _c11 = c11;
        _c12 = c12;
        _t11 = _c12 * c11;
        _t10 = _t11 * c10;
        _t9 = _t10 * c9;
        _t8 = _t9 * c8;
        _t7 = _t8 * c7;
        _t6 = _t7 * c6;
        _t5 = _t6 * c5;
        _t4 = _t5 * c4;
        _t3 = _t4 * c3;
        _t2 = _t3 * c2;
        _t1 = _t2 * c1;
        _t0 = _t1 * c0;
    }

    public byte[] this[int index]
    {
        get
        {
            return new[]
            {
                (byte)(index / _t1),
                (byte)((index / _t2) % _c1),
                (byte)((index / _t3) % _c2),
                (byte)((index / _t4) % _c3),
                (byte)((index / _t5) % _c4),
                (byte)((index / _t6) % _c5),
                (byte)((index / _t7) % _c6),
                (byte)((index / _t8) % _c7),
                (byte)((index / _t9) % _c8),
                (byte)((index / _t10) % _c9),
                (byte)((index / _t11) % _c10),
                (byte)((index / _c12) % _c11),
                (byte)(index % _c12)
            };
        }
    }
}

This takes 897ms to complete on my computer (also creating & adding to an Array as in Method 2), which is about a 36.3% slower than the code in question (which takes 658ms).

Your second suggestion is also a significant saving in terms of memory consumption. (But I would note that it assumes that the list should not change) — Taemyr, Apr 22 '15 at 07:31
I need the whole list created at one time - I can't refer to an index within the list. — benpage, Apr 22 '15 at 07:36
@Taemyr Thanks. I'll update to note that accordingly. If the implementation truly insists that you have the entire list populated upfront, then this 3rd option obviously won't work for you. — Biscuits, Apr 22 '15 at 07:54

score 8 · Answer 5 · answered Apr 22 '15 at 06:36

var numbers = new[] { 2, 3, 4, 3, 4, 3, 3, 4, 2, 4, 4, 3, 4 };
var result = (numbers.Select(i => Enumerable.Range(0, i))).CartesianProduct();

Using the extension method at http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/

public static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
{
    // base case: 
    IEnumerable<IEnumerable<T>> result =
        new[] { Enumerable.Empty<T>() };
    foreach (var sequence in sequences)
    {
        // don't close over the loop variable (fixed in C# 5 BTW)
        var s = sequence;
        // recursive case: use SelectMany to build 
        // the new product out of the old one 
        result =
            from seq in result
            from item in s
            select seq.Concat(new[] { item });
    }
    return result;
}

1

this runs a lot slower :( – benpage Apr 22 '15 at 06:43

score 8 · Answer 6 · answered Apr 22 '15 at 06:37

8

List has an array internally where it stores it values, with a fixed length . When You call List.Add it checks if there is enough space. When it dcannot add the new element it will create a new array of a larger size, copy all the previous elements over, then add then new one. This takes quite a few cycles.

Since you know the number of elements already, you can create the list of the correct size, that should be a lot faster already.

Also, not sure how you access the values, but you could create this thing one and save the image in code (loading it from disk will probably be slower than what you;re doing now. How many times do you read/ write to this thing?

answered Apr 22 '15 at 06:37

gjvdkamp

9,929
3
38
46

I've actually tried pre-allocating a regular array and believe it or not, it's slower. As I said above, this needs to be created on-the-fly, I can't calculate it once and leave it. – benpage Apr 22 '15 at 06:45
really? wow - you are running with optimizations enabled right? (just asking) – Random Dev Apr 22 '15 at 06:47
Ah that's another issue, regular arrays [x,y] are nice to use but an array of arrays will be faster. http://stackoverflow.com/questions/597720/what-are-the-differences-between-a-multidimensional-array-and-an-array-of-arrays because of how they're implemeted under the hood in IL – gjvdkamp Apr 22 '15 at 09:05

the_lotus · Answer 7 · 2015-04-22T18:10:40.070

Here's a different way that only need 2 loop. The idea is to increase the first element and if that number goes over than increase the next one.

Instead of displaying the data, you can use currentValues.Clone and add that cloned version into your list. For me this ran faster than your version.

byte[] maxValues = {2, 3, 4};
byte[] currentValues = {0, 0, 0};

do {
    Console.WriteLine("{0}, {1}, {2}", currentValues[0], currentValues[1], currentValues[2]);

    currentValues[0] += 1;

    for (int i = 0; i <= maxValues.Count - 2; i++) {
        if (currentValues[i] < maxValues[i]) {
            break;
        }

        currentValues[i] = 0;
        currentValues[i + 1] += 1;
    }

// Stop the whole thing if the last number is over
// } while (currentValues[currentValues.Length-1] < maxValues[maxValues.Length-1]);
} while (currentValues.Last() < maxValues.Last());

Hope this code works, I converted it from vb

score 3 · Answer 8 · answered Apr 22 '15 at 14:24

3

All your numbers are compile time constant.

What about unrolling all the loops into a list (using your program to write code):

data.Add(new [] {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
data.Add(new [] {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0});
etc.

That should at least take away the overhead of the for loops (if there is any).

I'm not familiar with C# too much, but there seems to be some means of serializing objects. What if you just generated that List and serialized it in some form? I'm not sure if the deserialization is quicker then creating the List and adding the elements, though.

answered Apr 22 '15 at 14:24

null

5,207
1
19
35

Serialization is a really great thinking outside of the box approach! – Joel B Apr 22 '15 at 19:26
Unfortunately the maximums in the list is dynamic, I can't statically type this. Good idea though! – benpage Apr 22 '15 at 22:59

score 2 · Answer 9 · edited Apr 23 '15 at 19:36

Do you need the result to be an array of arrays? With the current setup the length of the inner arrays is fixed and could be replaced with structs. This would allow the entire thing to be reserved as one continuous block of memory and provides easier access to the elements (not sure how you use this thing lateron).

The approach below is much faster (41ms vs 1071ms for the original on my box):

struct element {
    public byte a;
    public byte b;
    public byte c;
    public byte d;
    public byte e;
    public byte f;
    public byte g;
    public byte h;
    public byte i;
    public byte j;
    public byte k;
    public byte l;
    public byte m;
}

element[] WithStruct() {
    var t = new element[3981312];
    int z = 0;
    for (byte a = 0; a < 2; a++)
    for (byte b = 0; b < 3; b++)
    for (byte c = 0; c < 4; c++)
    for (byte d = 0; d < 3; d++)
    for (byte e = 0; e < 4; e++)
    for (byte f = 0; f < 3; f++)
    for (byte g = 0; g < 3; g++)
    for (byte h = 0; h < 4; h++)
    for (byte i = 0; i < 2; i++)
    for (byte j = 0; j < 4; j++)
    for (byte k = 0; k < 4; k++)
    for (byte l = 0; l < 3; l++)
    for (byte m = 0; m < 4; m++)
    {
        t[z].a = a;
        t[z].b = b;
        t[z].c = c;
        t[z].d = d;
        t[z].e = e;
        t[z].f = f;
        t[z].g = g;
        t[z].h = h;
        t[z].i = i;
        t[z].j = j;
        t[z].k = k;
        t[z].l = l;
        t[z].m = m;
        z++;
    }
    return t;
}

Good idea - in fact, that is actually what I have done in my real-world project - I just didn't put it in the original solution because of simplicity. I was mainly looking for a better alternative to nested loops. — benpage, Apr 22 '15 at 22:57

jdphenix · Answer 10 · 2015-04-23T06:10:53.183

What about using Parallel.For() to run it? (Struct optimization kudos to @Caramiriel). I slightly modified the values (a is 5 instead of 2) so I'm more confident in the results.

    var data = new ConcurrentStack<List<Bytes>>();
    var sw = new Stopwatch();

    sw.Start();

    Parallel.For(0, 5, () => new List<Bytes>(3*4*3*4*3*3*4*2*4*4*3*4),
      (a, loop, localList) => {
        var bytes = new Bytes();
        bytes.A = (byte) a;
        for (byte b = 0; b < 3; b++) {
          bytes.B = b;
          for (byte c = 0; c < 4; c++) {
            bytes.C = c; 
            for (byte d = 0; d < 3; d++) {
              bytes.D = d; 
              for (byte e = 0; e < 4; e++) {
                bytes.E = e; 
                for (byte f = 0; f < 3; f++) {
                  bytes.F = f; 
                  for (byte g = 0; g < 3; g++) {
                    bytes.G = g; 
                    for (byte h = 0; h < 4; h++) {
                      bytes.H = h; 
                      for (byte i = 0; i < 2; i++) {
                        bytes.I = i; 
                        for (byte j = 0; j < 4; j++) {
                          bytes.J = j; 
                          for (byte k = 0; k < 4; k++) {
                            bytes.K = k; 
                            for (byte l = 0; l < 3; l++) {
                              bytes.L = l;
                              for (byte m = 0; m < 4; m++) {
                                bytes.M = m;
                                localList.Add(bytes);
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }


        return localList;
      }, x => {
        data.Push(x);
    });

    var joinedData = _join(data);

_join() is a private method, defined as:

private static IList<Bytes> _join(IEnumerable<IList<Bytes>> data) {
  var value = new List<Bytes>();
  foreach (var d in data) {
    value.AddRange(d);
  }
  return value;
}

On my system, this version runs approximately 6 times faster (1.718 seconds versus 0.266 seconds).

This is pretty much guaranteed to give you false sharing and will probably be many times slower. — gjvdkamp, Apr 22 '15 at 06:38
Not bad - unfortunately it runs **slower** than the for loops. FWIW I tried it with ALL `Parallel.For`s and VS crashed! — benpage, Apr 22 '15 at 06:38
@gjvdkamp I've updated my answer with a parallel version that I believe eliminates the false sharing problem. — jdphenix, Apr 23 '15 at 06:12

score 0 · Answer 11 · answered Apr 22 '15 at 11:02

0

Some of your numbers fit entirely on an integer nuimber of bits, so you can "pack" them with the upper level number :

for (byte lm = 0; lm < 12; lm++)
{
    ...
    t[z].l = (lm&12)>>2;
    t[z].m = lm&3;
    ...
}

Of course, this makes the code less readable, but you saved one loop. This can be done each time one of the numbers is a power of two, which is seven time in your case.

answered Apr 22 '15 at 11:02

Fabien Dupont

36
3

I'd like to know more about this answer - could you expand on it? – benpage Apr 22 '15 at 22:58
Sorry for late answer. m goes from 0 to 3, which in binary makes 00 to 11, l from 0 to 2, which makes 00 to 10, so if you print them separatey, this would make : 00 00 00 01 00 10 00 11 01 00 ... 10 11 You can mege these together in a single number of 4 bits, going from 0000 to 1011, and select the appropriate bits using a mask lm & 3 makes the biwise and between lm and (11)b lm&12 makes the same with lm and (1100)b then we shift by two bits to have the "real"number. By the way, just realized it's enaough to do lm >> 2 in this case. – Fabien Dupont Apr 28 '15 at 08:28

Henrik · Answer 12 · 2015-04-30T04:04:30.473

Here is another solution. Outside of VS it runs as fast as 437.5 ms which is 26% faster than the original code (593.7 on my computer):

static List<byte[]> Combinations(byte[] maxs)
{
  int length = maxs.Length;
  int count = 1; // 3981312;
  Array.ForEach(maxs, m => count *= m);
  byte[][] data = new byte[count][];
  byte[] counters = new byte[length];

  for (int r = 0; r < count; r++)
  {
    byte[] row = new byte[length];
    for (int c = 0; c < length; c++)
      row[c] = counters[c];
    data[r] = row;

    for (int i = length - 1; i >= 0; i--)
    {
      counters[i]++;
      if (counters[i] == maxs[i])
        counters[i] = 0;
      else
        break;
    }
  }

  return data.ToList();
}

Faster alternative to nested loops?

12 Answers12

Linked