0

I've the following code:

int n = 150000;
int[] myArray = Enumerable.Range(1, n).ToArray();

So I want myArray contains an enumeration like 1,2,3,4,etc...
Of course the size of the array should be variable.

The thing is this is called millions of times inside a Parallel.For so I'm looking for a way to improve it as fast as it possible. Every iteration, n is different.

I just nugged the CommunityToolkit.HighPerformance in order to use some advantages from there, I'm wonder if I can use Span<T> to replace the above code, since I read this code:

var array = new byte[100];
var span = new Span<byte>(array);

span.Fill(255);

So I tried to do this:

var myArray = new int[n];
var span = new Span<int>(myArray );
span.Fill(/*nothing works here */);

So how can I populate that array with a serie of 1 to n?
I will accept another way instead using Fill or even Span<T>. The objective is made faster the whole process.

Leandro Bardelli
  • 10,561
  • 15
  • 79
  • 116
  • 1
    Are you concerened about memory usage and GC pressure or about performance? How do you use the array; do you really need it as array or is an `IEnumerable` enough? – Klaus Gütter Mar 10 '23 at 04:05
  • 2
    [Span.Fill(T)](https://learn.microsoft.com/en-us/dotnet/api/system.span-1.fill?view=net-7.0) "Fills the elements of this span with a specified value.", this isn't going to help you achieve your goal of filling an array with an incrementing number. –  Mar 10 '23 at 04:10
  • @KlausGütter not about memory, but speed, since I'm running millions of simulations. In this case, the array will be iterated by a foreach, and every item of the array will be inserted in int[] set1 or int[] set2 by random. Im doing this: https://stackoverflow.com/a/75632654/888472 – Leandro Bardelli Mar 10 '23 at 04:11
  • @rshepp I suppose so, but maybe was an option to "modify" the value, I will accept other way instead using Fill. The objetive is made faster the whole process. I'll update my question with this clarification, thanks. – Leandro Bardelli Mar 10 '23 at 04:13
  • Span might not help in your situation. Are you saying there is a big performance ding when filling the array? I'd be curious to see how you achieve this in code. – beautifulcoder Mar 10 '23 at 04:20
  • @beautifulcoder from what I read from: https://stackoverflow.com/questions/1897555/what-is-the-equivalent-of-memset-in-c – Leandro Bardelli Mar 10 '23 at 04:23
  • 4
    Maybe the algorithm that will operate on this array can take advantage of the fact that it knows, a priori, that each array element value equals the array element index + 1. In other words, if you *really* want it to be fast, there's a good chance you can skip this step entirely! For example, if the goal is to next partition this into two arrays, you don't need the values of the first array stored in memory at all to produce the second arrays. – Wyck Mar 10 '23 at 04:37
  • @Wyck thanks a lot, I finally change my code to reach Span. But I will left the question because I want to learn, Im very new with this. – Leandro Bardelli Mar 10 '23 at 04:54
  • 3
    Intuitively, I'm thinking that pre-allocating the array and then `foreach`ing over `Enumerable.Range(1, n)` to fill the array would be faster. But, who knows whether my intuition is any good. By the way, this site doesnt do _superlatives_ they are necessarily opinion based. If you want to find your answer, doing some benchmarking is the way to go – Flydog57 Mar 10 '23 at 05:05
  • 2
    It's worth noting that if you are going to be doing this _millions of times_, you should be doing something about buffer management if you are concerned about performance. Each time you leave one of these buffers unreferenced, you are leaving an object on the _Large Object Heap_. Only Gen2 garbage collections can collect LOH objects. Figure out how many of these you need at one time, create a collection of buffers/arrays and allocate (/deallocate) the buffers as needed. Otherwise, you process will eventually slow down from GC work. Also, doing a 150k item array allocated means zeroing that – Flydog57 Mar 10 '23 at 16:31
  • @Flydog57 thats perfect, I never think about that. And that explain the behavior im founding after doing some millions. – Leandro Bardelli Mar 10 '23 at 18:02

2 Answers2

3

Here is a vectorized implementation of a FillIncremental method. The Vector<T> is a small container of values that a single CPU core can process in parallel. In my PC the Vector.IsHardwareAccelerated is true and the Vector<int>.Count is 8. Initially a Vector<int> is filled with the values from 1 to 8. Then in each step all these values are incremented by 8 with a single += operation, and the incremented vector is copied to the next section of the target array. Finally the last few slots in the array that have not been filled by the vector (because the array Length might not be divisible by 8), are filled with a simple for loop:

/// <summary>
/// Fills an array of integers with incremented values, starting from the 'startValue'.
/// </summary>
public static void FillIncremental(int[] array, int startValue = 0)
{
    ArgumentNullException.ThrowIfNull(array);
    if (array.Length > 0 && startValue > (Int32.MaxValue - array.Length) + 1)
        throw new ArgumentOutOfRangeException(nameof(startValue));

    static void FillSimple(int[] array, int index, int length, int valueOffset)
    {
        int endIndex =  index + length;
        for (int i = index, j = index + valueOffset; i < endIndex; i++, j++)
            array[i] = j;
    }

    if (!Vector.IsHardwareAccelerated || array.Length < Vector<int>.Count)
    {
        FillSimple(array, 0, array.Length, startValue);
        return;
    }
    FillSimple(array, 0, Vector<int>.Count, startValue);
    Vector<int> vector = new(array);
    Vector<int> step = new(Vector<int>.Count);
    int endIndex = array.Length - Vector<int>.Count + 1;
    int i;
    for (i = Vector<int>.Count; i < endIndex; i += Vector<int>.Count)
    {
        vector += step;
        vector.CopyTo(array, i);
    }
    FillSimple(array, i, array.Length - i, startValue);
}

Usage example:

int n = 150_000;
int[] myArray = new int[n];
FillIncremental(myArray, 1);

In my PC the FillIncremental method is about 4 times faster than filling the array with a simple for loop (online benchmark).

I am not overly familiar with vectors, so it might be possible to optimize further the above approach.


Update: Enigmativity mentioned in the comments that the simple int[] myArray = Enumerable.Range(1, n).ToArray() is actually faster than the above vectorized implementation. My own benchmark confirms this observation. Currently I have no idea why the Enumerable.Range is so fast. According to the source code is should perform similarly to the FillSimple above, so it should be around 3 times slower (taking into account the constant time of instantiating the array). It's around 15% faster instead (on .NET 7, .NET 6 and .NET 5).

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 1
    Incredible, this give me a lot to learn thanks!!! – Leandro Bardelli Mar 10 '23 at 12:50
  • 1
    I did, however, test with `Enumerable.Range(0, 150_000).ToArray()` and that was 2 to 3 times faster than `FillIncremental`. – Enigmativity Mar 13 '23 at 00:53
  • @Enigmativity according to [my measurements](https://dotnetfiddle.net/flpzhN) the `Enumerable.Range(..).ToArray()` is around 15% faster than initializing an array and calling the `FillIncremental`, which TBH I can't explain. The [source code](https://github.com/dotnet/runtime/blob/ca584ef0160c1806c91f2edf544de7070958ff55/src/libraries/System.Linq/src/System/Linq/Range.SpeedOpt.cs#L17) of the `RangeIterator.ToArray` method is identical with the `FillSimple` inside my `FillIncremental`, so it should be much slower (3 times slower, not 4, because the duration of creating a new array is constant). – Theodor Zoulias Mar 13 '23 at 02:48
  • @Enigmativity initially I assumed that the `Enumerable.Range` is vectorized, but it's not. A related [pull request](https://github.com/dotnet/runtime/pull/80633 "Vectorize Enumerable.Range(...).ToArray()") has been submitted for .NET 8, but currently has not been accepted. This exceptional performance is a mystery to me. – Theodor Zoulias Mar 13 '23 at 02:52
  • 1
    @TheodorZoulias - To be honest, I thought you nailed it with your code, and I wanted to see that there was a significant improvement. I was surprised too. – Enigmativity Mar 13 '23 at 09:42
2

There are two possible situations here:

  1. The created array is actually needed (i.e. you can't just use for loop with some counter), then if you know maximum n and it is constrained enough - then the fastest approach I could think of is to preallocate the array and use Array.Copy:

    static int[] preallocated = Enumerable.Range(1, 500_000).ToArray();
    
    int[] array = new int[150_000];
    Array.Copy(preallocated, array, array.Length)
    

    this approach also can be modified to handle the arrays out of bounds also, i.e. first copying the preallocated memory and then using great @Theodor Zoulias's approach to fill the rest (though it can become a bit cumbersome).

    Updated benchmark from another answer

  2. You don't actually need the array - if your goal is to split numbers from 1 to n into 2 arrays I would add simple for loop to comparison. Or just using Enumerable.Range and iterating it (another option is using preallocated array with readonly view via ReadOnlySpan/ReadOnlyMemory, but if you really don't need an array - I think there is no point to waste the memory).

P.S.

  1. Allocating arrays and then GCing them can be costly (especially large one - see The large object heap (LOH)), consider using ArrayPool. Note that if you are renting more than there are items in corresponding bucket, pooling approach will result in allocation (see maxArraysPerBucket of ArrayPool.Create). Consider creating a separate pool for such arrays and set clearArray to false when calling Return. And do not forget that Array.Rent can return array bigger then requested (you can switch to Span/Memory for subsequent processing)
  2. If array pooling is no go and array creation is still required, then consider using GC.AllocateUninitializedArray<T> which can have positive performance difference in case of big number of allocations due to skip zeroing of the memory (we know that we will manually fill it right away).
  3. Do proper (micro)benchmarking using proper tools and techniques. For example BenchmarkDotNet.
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • woooooooowwwwwwwwwwww thats a lot info to process and learn, thanks a lot for the effort and the time! I will modify my code with this for sure – Leandro Bardelli Mar 11 '23 at 00:55