191

I'm busy rewriting an old project that was done in C++, to C#.

My task is to rewrite the program so that it functions as close to the original as possible.

During a bunch of file-handling the previous developer who wrote this program creates a structure containing a ton of fields that correspond to the set format that a file has to be written in, so all that work is already done for me.

These fields are all byte arrays. What the C++ code then does is use memset to set this entire structure to all spaces characters (0x20). One line of code. Easy.

This is very important as the utility that this file eventually goes to is expecting the file in this format. What I've had to do is change this struct to a class in C#, but I cannot find a way to easily initialize each of these byte arrays to all space characters.

What I've ended up having to do is this in the class constructor:

//Initialize all of the variables to spaces.
int index = 0;
foreach (byte b in UserCode)
{
    UserCode[index] = 0x20;
    index++;
}

This works fine, but I'm sure there must be a simpler way to do this. When the array is set to UserCode = new byte[6] in the constructor the byte array gets automatically initialized to the default null values. Is there no way that I can make it become all spaces upon declaration, so that when I call my class' constructor that it is initialized straight away like this? Or some memset-like function?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
DeVil
  • 1,999
  • 2
  • 11
  • 14

13 Answers13

239

For small arrays use array initialisation syntax:

var sevenItems = new byte[] { 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20 };

For larger arrays use a standard for loop. This is the most readable and efficient way to do it:

var sevenThousandItems = new byte[7000];
for (int i = 0; i < sevenThousandItems.Length; i++)
{
    sevenThousandItems[i] = 0x20;
}

Of course, if you need to do this a lot then you could create a helper method to help keep your code concise:

byte[] sevenItems = CreateSpecialByteArray(7);
byte[] sevenThousandItems = CreateSpecialByteArray(7000);

// ...

public static byte[] CreateSpecialByteArray(int length)
{
    var arr = new byte[length];
    for (int i = 0; i < arr.Length; i++)
    {
        arr[i] = 0x20;
    }
    return arr;
}
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Hmmm... not a bad suggestion. That would indeed be both more efficient and more readable than the `Enumerable` method. Thanks for the input. – DeVil May 27 '11 at 09:45
  • 7
    You might want to turn that into an extension method, too. That way you could call it like `byte[] b = new byte[5000].Initialize(0x20);` The extension method would be declared as `public static byte[] Initialize(this byte[] array, byte defaultValue)` and contain the for loop. It should return the array. – Thorsten Dittmar May 27 '11 at 11:43
  • How come this is legal but new byte {4,3,2}; throws an error saying byte doesn't implement the enumerable type? – fIwJlxSzApHEZIl Nov 16 '12 at 22:00
  • 2
    The for loop should be using a decrement operation. I have done extensive benchmarking and a decrementing for loop is typically twice as fast as an incrementing for loop, when the body has only a simple instruction such as filling an array element. – deegee Jun 18 '13 at 21:47
  • 4
    @advocate: The initialization ```new byte {4, 3, 2}``` is missing the square brackets ```[]``` to declare an array. Also, your constants need to be convertible to ```byte```, which numbers (```int```s) such as 4, 3, and 2 are not. So it has to be: ```new byte[] { (byte) 4, (byte) 3, (byte) 2}```, or the hex syntax. – Oliver Dec 01 '14 at 21:44
  • @Oliver: it has to be `new byte[] { 4, 3, 2 }` (implicitly typed literals) or `new [] { (byte)4, (byte)3, (byte)2 }` (implicitly typed array) or, like you said, `new byte[] { 0x04, 0x03, 0x02 }` (implicitly typed literals) but saying `new byte[] { (byte)4, (byte)3, (byte)2 }` is unnecessarily verbose. – Graeme Wicksted Feb 09 '17 at 19:39
  • Decrementing is not faster in a loop that makes no sense. However incrementing when the comparison calls the Length function each time would make the decrementing version faster as Length is called once. Alternatively a local variable set to Length would make increment just as fast as decrement. – Gregory Morse Feb 17 '20 at 05:05
  • Can a byte array easily be appended, like in a loop, or is that inefficient? –  Jul 04 '22 at 16:03
113

Use this to create the array in the first place:

byte[] array = Enumerable.Repeat((byte)0x20, <number of elements>).ToArray();

Replace <number of elements> with the desired array size.

Thorsten Dittmar
  • 55,956
  • 8
  • 91
  • 139
  • 4
    This is inferior to the OP's original solution. This still involves creating and filling the array in separate steps. In fact, it will usually end up creating, filling and then discarding several (perhaps many) intermediate arrays instead of just allocating a single array and then filling it. – LukeH May 27 '11 at 09:25
  • 3
    Interestingly as the question that @PompolutZ found http://stackoverflow.com/questions/1897555/what-is-the-equivalent-of-memset-in-c suggests this is not as efficient as the loop which probably makes some sense really since this is doing a lot more than just setting some values. It might be simpler (which is what was asked) but I don't know that this means better. :) As always test performance if relevant. ;-) – Chris May 27 '11 at 09:25
  • 1
    @LukeH/@Chris: I read the performance analysis that PompolutZ found in his second link. It's rather interesting to see that the simple `for` loop is so much more efficient for a large number of array elements and iterations. In the OP's scenario, performance should not be an issue - and he asked for something "simpler" than a loop ;-) – Thorsten Dittmar May 27 '11 at 09:31
  • Indeed. My main concern here is more compact code; if I have to do this method for each of the files that the program has to generate and process and keep things as they are, I'm going to have to copy and paste a ton of loops. I'm sure there are ways to implement this file-handling in C# that will make this problem moot, but I'm on quite a tight time-schedule here, so it's much more convenient to mimic the way it was done in the old code. As I've mentioned in another comment these arrays are all very small, but there are a lot of them so the `Enumerable` method is the most compact. – DeVil May 27 '11 at 09:41
  • Seems that this generates a int array, not a byte array as requested. – Ben Aug 03 '16 at 19:40
44

You can use Enumerable.Repeat()

Enumerable.Repeat generates a sequence that contains one repeated value.

Array of 100 items initialized to 0x20:

byte[] arr1 = Enumerable.Repeat((byte)0x20,100).ToArray();
Fred
  • 3,365
  • 4
  • 36
  • 57
Yochai Timmer
  • 48,127
  • 24
  • 147
  • 185
  • 1
    Is the .ToArray() needed as in Thorsten's answers? – Chris May 27 '11 at 09:18
  • Not sure about it, it might do it implicitly. (I don't have vs2010 running to test it) – Yochai Timmer May 27 '11 at 09:19
  • 4
    Enumerable.Repeat() returns an IEnumerable, so the explicit call of ToArray() is required. – Scott Ferguson Apr 17 '12 at 00:59
  • 2
    It is also required to cast the element to repeat to `byte` to get a Byte array, rather than an `Int32` array as it would come out in this case. Aka `byte[] arr1 = Enumerable.Repeat((byte)0x20, 100).ToArray();` – Ray Jul 26 '17 at 14:15
36
var array = Encoding.ASCII.GetBytes(new string(' ', 100));
Yuriy Rozhovetskiy
  • 22,270
  • 4
  • 37
  • 68
  • 1
    Just a question, does the array now contain the null terminator produced by using new string(...)? – Neil Sep 30 '14 at 15:17
  • 2
    @Neil: Actually, there is no answer to your question, because new string() does not produce a null terminator (visible to .NET). In .NET, we don't think about it, and we don't worry about it. It's simply not there. – Oliver Dec 01 '14 at 21:50
  • 1
    Works correctly, even to fill with 0x00 bytes: Encoding.ASCII.GetBytes(new string((char)0, 100)); – Ben Aug 03 '16 at 19:44
  • Funny that I can use many values, but nothing higher than 0x7F. If I use 0x80 or higher, the buffer is filled with 0x3F. So that's lower 128 ASCII only. And this is almost 10x slower than John's answer. – ajeh Jan 10 '18 at 21:43
  • 1
    @ajeh: That is because ASCII character set is only the "lower" 128.values, 0x00-0x7F. The "upper" ASCII values (0x80-0xFF) are Extended ASCII, The ,Net Encoding.ASCII returns 0x3F (or "?") for the unknown/extended values. – mharr Oct 07 '19 at 14:23
20

If you need to initialise a small array you can use:

byte[] smallArray = new byte[] { 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20 };

If you have a larger array, then you could use:

byte[] bitBiggerArray Enumerable.Repeat(0x20, 7000).ToArray();

Which is simple, and easy for the next guy/girl to read. And will be fast enough 99.9% of the time. (Normally will be the BestOption™)

However if you really really need super speed, calling out to the optimized memset method, using P/invoke, is for you: (Here wrapped up in a nice to use class)

public static class Superfast
{
    [DllImport("msvcrt.dll",
              EntryPoint = "memset",
              CallingConvention = CallingConvention.Cdecl,
              SetLastError = false)]
    private static extern IntPtr MemSet(IntPtr dest, int c, int count);

    //If you need super speed, calling out to M$ memset optimized method using P/invoke
    public static byte[] InitByteArray(byte fillWith, int size)
    {
        byte[] arrayBytes = new byte[size];
        GCHandle gch = GCHandle.Alloc(arrayBytes, GCHandleType.Pinned);
        MemSet(gch.AddrOfPinnedObject(), fillWith, arrayBytes.Length);
        gch.Free();
        return arrayBytes;
    }
}

Usage:

byte[] oneofManyBigArrays =  Superfast.InitByteArray(0x20,700000);
Nottoc
  • 85
  • 6
DarcyThomas
  • 1,218
  • 13
  • 30
  • 1
    Hey Mister! I have tested your solution. It is fast but it causes memory leaking. While using .Alloc method along with GCHandleType.Pinned type argument, you should remember to use .Free on GCHandle to release to resources. More you can read in documentation: https://learn.microsoft.com/pl-pl/dotnet/api/system.runtime.interopservices.gchandle?view=netframework-4.8 – Kacper Werema Aug 14 '19 at 08:49
  • @KacperWerema Leaks that's no good! Feel free to edit my answer. (I don't have access to a PC to validate the code myself right now) – DarcyThomas Aug 15 '19 at 02:44
  • Annoying though that there is no .NET memset solution like there is for memcpy with Array.Copy… For loops and LINQ are both terrible at large scales. – Gregory Morse Feb 17 '20 at 05:07
5

This is a faster version of the code from the post marked as the answer.

All of the benchmarks that I have performed show that a simple for loop that only contains something like an array fill is typically twice as fast if it is decrementing versus if it is incrementing.

Also, the array Length property is already passed as the parameter so it doesn't need to be retrieved from the array properties. It should also be pre-calculated and assigned to a local variable. Loop bounds calculations that involve a property accessor will re-compute the value of the bounds before each iteration of the loop.

public static byte[] CreateSpecialByteArray(int length)
{
    byte[] array = new byte[length];

    int len = length - 1;

    for (int i = len; i >= 0; i--)
    {
        array[i] = 0x20;
    }

    return array;
}
deegee
  • 1,553
  • 14
  • 13
5

Maybe these could be helpful?

What is the equivalent of memset in C#?

http://techmikael.blogspot.com/2009/12/filling-array-with-default-value.html

Community
  • 1
  • 1
fxdxpz
  • 1,969
  • 17
  • 29
  • 1
    Interesting links that suggest the currently upvoted answers are actually less efficient than the loop for large sizes. – Chris May 27 '11 at 09:24
  • Good point, but these fields are all fairly small as they each only read a single value from a database. I like the Enumerable method since there are quite a few files that this program has to process and generate and they all are done in this manner, so it makes the code much more compact. – DeVil May 27 '11 at 09:29
  • 1
    @DeVil: if you want compact code you could easily just create a method with signature something like PopulateByteArray(byte[] array, byte value) and then have your code in that. I'd say that was probably even neater than repeating the Enumerable.Repeat all over the place and has the advantage of better efficienct too. ;-) – Chris May 27 '11 at 09:32
  • Agreed. Seems I may have been a bit hasty in my acceptance of the `Enumerable.Repeat` method. – DeVil May 27 '11 at 09:47
5

Guys before me gave you your answer. I just want to point out your misuse of foreach loop. See, since you have to increment index standard "for loop" would be not only more compact, but also more efficient ("foreach" does many things under the hood):

for (int index = 0; index < UserCode.Length; ++index)
{
    UserCode[index] = 0x20;
}
gwiazdorrr
  • 6,181
  • 2
  • 27
  • 36
  • You may be right. I was implementing this particular part of the code one Saturday afternoon (no overtime pay ;( ) and my brain was at that point where I was just panel-beating code to make it work. It's been bugging me since and I've only now come back to look at it. – DeVil May 27 '11 at 09:26
  • 1
    If you are running on a machine with OoO execution, dividing the buffer size by 2 or 4, etc, and assigning `buf[i]`, `buf[i+1]` etc will be much faster, by a factor of 2x on the current i5 and i7. But still not as fast as John's answer. – ajeh Jan 10 '18 at 22:22
4

The fastest way to do this is to use the api:

bR = 0xFF;

RtlFillMemory(pBuffer, nFileLen, bR);

using a pointer to a buffer, the length to write, and the encoded byte. I think the fastest way to do it in managed code (much slower), is to create a small block of initialized bytes, then use Buffer.Blockcopy to write them to the byte array in a loop. I threw this together but haven't tested it, but you get the idea:

long size = GetFileSize(FileName);
// zero byte
const int blocksize = 1024;
// 1's array
byte[] ntemp = new byte[blocksize];
byte[] nbyte = new byte[size];
// init 1's array
for (int i = 0; i < blocksize; i++)
    ntemp[i] = 0xff;

// get dimensions
int blocks = (int)(size / blocksize);
int remainder = (int)(size - (blocks * blocksize));
int count = 0;

// copy to the buffer
do
{
    Buffer.BlockCopy(ntemp, 0, nbyte, blocksize * count, blocksize);
    count++;
} while (count < blocks);

// copy remaining bytes
Buffer.BlockCopy(ntemp, 0, nbyte, blocksize * count, remainder);
JGU
  • 879
  • 12
  • 14
4

Just to expand on my answer a neater way of doing this multiple times would probably be:

PopulateByteArray(UserCode, 0x20);

which calls:

public static void PopulateByteArray(byte[] byteArray, byte value)
{
    for (int i = 0; i < byteArray.Length; i++)
    {
        byteArray[i] = value;
    }
}

This has the advantage of a nice efficient for loop (mention to gwiazdorrr's answer) as well as a nice neat looking call if it is being used a lot. And a lot mroe at a glance readable than the enumeration one I personally think. :)

Chris
  • 27,210
  • 6
  • 71
  • 92
3

This function is way faster than a for loop for filling an array.

The Array.Copy command is a very fast memory copy function. This function takes advantage of that by repeatedly calling the Array.Copy command and doubling the size of what we copy until the array is full.

I discuss this on my blog at https://grax32.com/2013/06/fast-array-fill-function-revisited.html (Link updated 12/16/2019). Also see Nuget package that provides this extension method. http://sites.grax32.com/ArrayExtensions/

Note that this would be easy to make into an extension method by just adding the word "this" to the method declarations i.e. public static void ArrayFill<T>(this T[] arrayToFill ...

public static void ArrayFill<T>(T[] arrayToFill, T fillValue)
{
    // if called with a single value, wrap the value in an array and call the main function
    ArrayFill(arrayToFill, new T[] { fillValue });
}

public static void ArrayFill<T>(T[] arrayToFill, T[] fillValue)
{
    if (fillValue.Length >= arrayToFill.Length)
    {
        throw new ArgumentException("fillValue array length must be smaller than length of arrayToFill");
    }

    // set the initial array value
    Array.Copy(fillValue, arrayToFill, fillValue.Length);

    int arrayToFillHalfLength = arrayToFill.Length / 2;

    for (int i = fillValue.Length; i < arrayToFill.Length; i *= 2)
    {
        int copyLength = i;
        if (i > arrayToFillHalfLength)
        {
            copyLength = arrayToFill.Length - i;
        }

        Array.Copy(arrayToFill, 0, arrayToFill, i, copyLength);
    }
}
Grax32
  • 3,986
  • 1
  • 17
  • 32
2

You could speed up the initialization and simplify the code by using the the Parallel class (.NET 4 and newer):

public static void PopulateByteArray(byte[] byteArray, byte value)
{
    Parallel.For(0, byteArray.Length, i => byteArray[i] = value);
}

Of course you can create the array at the same time:

public static byte[] CreateSpecialByteArray(int length, byte value)
{
    var byteArray = new byte[length];
    Parallel.For(0, length, i => byteArray[i] = value);
    return byteArray;
}
slfan
  • 8,950
  • 115
  • 65
  • 78
  • Note: Parallel class requires .NET 4+ – deegee Jun 20 '13 at 17:58
  • 2
    Have you tested the performance of this? Looks like you would be thread stealing from other work. and you would have the thread management over head. Ok if it is this only thing your code is doing at that time but not if you have other things happening at the same time. – DarcyThomas Aug 26 '16 at 03:38
  • @DarcyThomas The threads come from the ThreadPool. And of course it depends what "other work" is going on. If nothing else is going on, it is up to (#ofCPUs-1) times faster than the conventional loop. – slfan Aug 26 '16 at 05:06
  • 2
    It is quite simple to prove that the `Parallel` class would be a very inefficient overkill for this rudimentary simple task. – ajeh Jan 10 '18 at 20:54
  • @ajeh You are right. I tested it once with a more complex initialisation and it was about 3 times faster on a 4 core machine. In a real application I always do a performance test, before I use the Parallel class. – slfan May 11 '18 at 14:55
2

You can use a collection initializer:

UserCode = new byte[]{0x20,0x20,0x20,0x20,0x20,0x20};

This will work better than Repeat if the values are not identical.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • 2
    Useful for small arrays but definitely not for bigger ones. :) – Chris May 27 '11 at 09:16
  • Indeed. I'm aware of this way of initializing, but there are a LOT of fields and they all vary in size. This method would be even more painful than my loops. – DeVil May 27 '11 at 09:19