0

I have a potentially larger int array that I'm writing to a file using BinaryWriter. Of course, I can use the default approach.

using (BinaryWriter writer = new BinaryWriter(File.Open(path, FileMode.Create)))
{
    writer.Write(myIntArray.Length);
    foreach (int value in myIntArray)
        writer.Write(value);
}

But this seems horribly inefficient. I'm pretty sure an int array stores data contiguously in memory. Is there no way to just write the memory directly to a file like you can with a byte array? Maybe a way to cast (not copy) the int array to a byte array?

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
  • 1
    [Something like this?](https://stackoverflow.com/a/621506/424129) – 15ee8f99-57ff-4f92-890c-b56153 May 07 '19 at 19:53
  • 1
    @EdPlunkett: Well, you get the idea, but that still loops through, converts and copies each element. I would expect that to be a lot slower than my current code. The point is that, internally, everything is going to be passed as a memory buffer to be written to file. And my array starts with an memory buffer. I'm wondering if there's anyway to avoid the looping, converting and copying, etc., as I would do in C++. To somehow pass the original memory buffer directly. – Jonathan Wood May 07 '19 at 19:57
  • 1
    True, you really just want the pointer, not to mess around with higher level abstractions. [This perhaps](https://bytes.com/topic/c-sharp/answers/257745-how-convert-system-array-intptr)? OTOH Caius may have a point: Are you optimizing prematurely? It does "seem horribly (well, mildly) inefficient", when you kind of just squint at the code and let your imagination run free, but that's not a measurement. – 15ee8f99-57ff-4f92-890c-b56153 May 07 '19 at 19:59
  • 3
    How much slower than your disk do you think your code is? – Caius Jard May 07 '19 at 19:59
  • 1
    Are you using .Net core or full framework? .Net core has the MemoryMarshal class which has support for this kind unsafe/nonportable cast for spans. – Mike Zboray May 07 '19 at 20:04
  • @MikeZboray: I'm using .NET Core. I'm looking into that. (Not yet finding good examples yet.) – Jonathan Wood May 07 '19 at 20:04
  • 1
    As @EdPlunkett implies, do you have any data to suggest this is not efficient? We need to know the measurement you're using if we're to answer the question with something "more efficient". See also [Eric Lippert's blog post on the subject](https://ericlippert.com/2012/12/17/performance-rant/). – Heretic Monkey May 07 '19 at 20:11
  • @HereticMonkey: You know, I just like to write code the way I like to write it. I used C, C++ and assembly language for many years and can plainly see that the default approach to this involves additional steps. And I generally like to avoid unnecessary steps when I code. Call me crazy. That's just me. There is an overload that takes `byte[]`, just trying out if there's a way to do the same thing with `int[]`. – Jonathan Wood May 07 '19 at 20:17
  • 1
    I'm all for writing efficient code, but as I said, how can we know whether we've made the mark if you won't give us a ruler by which to measure it? – Heretic Monkey May 07 '19 at 20:19
  • @HereticMonkey: My ruler is the extra step involved in the default approach. If I can eliminate that, then I feel all warm and fuzzy. – Jonathan Wood May 07 '19 at 20:23
  • @JonathanWood If all you’re concerned about is source code aesthetics, I recommend the SelectMany(). It’s unfortunate that “efficiency” at SO has come to mean something about typing. – 15ee8f99-57ff-4f92-890c-b56153 May 07 '19 at 21:06
  • @EdPlunkett: I care about aesthetics, but this post is all about efficiency. I'd have to review the source code for `SelectMany()`, but I don't think there is anything efficient about it. – Jonathan Wood May 07 '19 at 21:16
  • @JonathanWood If you aren’t measuring performance, you aren’t taking efficiency into account. Unverified assumptions are a proverbially unreliable source of information. – 15ee8f99-57ff-4f92-890c-b56153 May 07 '19 at 21:25
  • @EdPlunkett: I can repeat what I said earlier. I've worked with a lot of lower level languages and understand the extra step that is happening in my code above. I enjoy writing code that is efficient. If you think extra steps don't effect efficiency, that's fine. It's not an argument. I like to write code how I like to write it. And I was just trying to figure out if there was any way to do that here in C#. I don't need verification to know what type of code I like to write. – Jonathan Wood May 07 '19 at 21:36
  • @EdPlunkett: Any by the way, to you and all the other people who always bring up premature optimization. You can't tune up an application that was written from scratch by someone who doesn't enjoy writing efficient code. Everyone else can write the type of code they want, so allow me to write the type of code I want. Sheesh! – Jonathan Wood May 07 '19 at 21:38

2 Answers2

3

There is support for the most efficient form without any copying in .NET Core via MemoryMarshal.Cast and Span<T>. This directly reinterprets the memory but this is potentially nonportable across platforms so it should be used with care:

 int[] values = { 1, 2, 3 };

 using (var writer = new BinaryWriter(File.Open(path, FileMode.Create)))
 {
     Span<byte> bytes = MemoryMarshal.Cast<int, byte>(values.AsSpan());
     writer.Write(bytes);
 }

Some relevant discussion of this API when it was moved from MemoryExtensions.NonPortableCast

However I will say your original is actually going to be fairly efficient because both BinaryWriter and FileStream have their own internal buffers that are used when writing ints like that.

Mike Zboray
  • 39,828
  • 3
  • 90
  • 122
  • Do you know what `AsSpan()` does? Does it have to copy or traverse anything? Or does it really end up with `bytes` just being a reference to the same memory occupied by `values`? – Jonathan Wood May 07 '19 at 20:21
  • AsSpan creates a `Span` which is a "stack only" value type. Its design for purpose was to be able to reference regions of memory without having to do additional allocations to access them. A region of memory could be a slice of a heap allocated array or a stack allocated array, for example. [This](https://msdn.microsoft.com/en-us/magazine/mt814808.aspx) article by Stephen Toub is a good summary. – Mike Zboray May 07 '19 at 20:26
  • Its a managed (type and memory safe) reference to a contiguous chuck of memory. – Justin Blakley May 07 '19 at 20:27
  • @MikeZboray: I can see it creates and returns a `Span`. I was just curious what it actually does under the covers. I will read the article. There are a few things that bother me about C#, and this is one of them. But this appears to be the best answer. Thanks. – Jonathan Wood May 07 '19 at 20:29
  • @JonathanWood From the little C++ I've done, yes it is very different. C# asks you to give up some control in exchange for making development easier. This can be unsettling to C++ veterans who are used to knowing exactly what is happening at all times because if you don't you will get bitten. In any event, I don't think this approach actually gets you very much performance because of the internal buffers used by BinaryWriter and FileStream in your original code. – Mike Zboray May 07 '19 at 22:45
  • @MikeZboray: Well, it still performs another step. The internal buffers don't change that. But I think I agree that it's probably not worth making my code somewhat obfuscated and risk lack of portability for the slight gain in performance this is likely to bring. – Jonathan Wood May 07 '19 at 23:30
  • writer.Write(bytes); should be writer.Write(bytes.ToArray()); – Oleg Bondarenko May 08 '19 at 08:22
  • @OlegBondarenko: I don't think so. In .NET Core, there is an overload that accepts a span. – Jonathan Wood May 08 '19 at 12:27
3

I thought it would be interesting to benchmark each of the methods outlined above, the original from @Jonathan-Wood (TestCopyStream), the Span suggestion from @Mike-Zboray (TestCopySpan) and the Buffer BlockCopy from @oleg-bondarenko (TestCopySpanByteCopy) [yup, naming things is hard].

I'm generating int arrays of size N of random numbers, the same set for each run.

Here's the results:

|               Method |     N |     Mean |     Error |    StdDev |   Median | Ratio | RatioSD | Rank |   Gen 0 | Gen 1 | Gen 2 | Allocated |
|--------------------- |------ |---------:|----------:|----------:|---------:|------:|--------:|-----:|--------:|------:|------:|----------:|
|         TestCopySpan |  1000 | 1.372 ms | 0.0382 ms | 0.1109 ms | 1.348 ms |  1.00 |    0.11 |    1 |       - |     - |     - |    4984 B |
|       TestCopyStream |  1000 | 1.377 ms | 0.0324 ms | 0.0935 ms | 1.364 ms |  1.00 |    0.00 |    1 |       - |     - |     - |    4984 B |
| TestCopySpanByteCopy |  1000 | 2.215 ms | 0.0700 ms | 0.2008 ms | 2.111 ms |  1.62 |    0.19 |    2 |  3.9063 |     - |     - |   13424 B |
|                      |       |          |           |           |          |       |         |      |         |       |       |           |
|         TestCopySpan | 10000 | 1.617 ms | 0.1167 ms | 0.3155 ms | 1.547 ms |  0.80 |    0.19 |    1 |       - |     - |     - |     864 B |
|       TestCopyStream | 10000 | 2.032 ms | 0.0776 ms | 0.2251 ms | 1.967 ms |  1.00 |    0.00 |    2 |       - |     - |     - |    4984 B |
| TestCopySpanByteCopy | 10000 | 2.433 ms | 0.0703 ms | 0.2040 ms | 2.430 ms |  1.21 |    0.18 |    3 | 11.7188 |     - |     - |   45304 B |
penderi
  • 8,673
  • 5
  • 45
  • 62
  • I have tested with N = 42949672 (Int32.MaxValue/50) - binary size was ~150Mb (I believed that it would be really huge int array) – Oleg Bondarenko May 10 '19 at 07:49