4

I have an array of shorts:

short[] data;

And I have a function that writes bytes to a file:

void Write(byte[] data);

I do not control this function and cannot change it. Is there a way to write my array of shorts without making a redundant copy first to convert it to bytes?

Something like that: Write((byte[])data);

I do not care about endianness. I want memory representation of shorts written to a file in whatever the machine representation of short is. I understand this kind of cast cannot work for any non-POD type that contains references, but shorts should be perfectly convertible. The cast should result in a byte array twice the size that points to the same memory.

If this is impossible in C#, is there anything in CLR that makes this impossible, or is it just C# limitation?

kaalus
  • 4,475
  • 3
  • 29
  • 39
  • 2
    There's nothing 'redundant' about something your code depends on later on. What is the issue with making a copy, out of curiosity? –  Nov 28 '16 at 13:34
  • 9
    A short is two byte, a byte is ... one byte. How do you intend to convert that? – fafl Nov 28 '16 at 13:35
  • *but shorts should be perfectly convertible* Individually, yes. But an array has properties like its length and rank. But you already know you can't cast complex types like this? *I understand this kind of cast cannot work for any non-POD type...* – ta.speot.is Nov 28 '16 at 13:35
  • @JᴀʏMᴇᴇ Waste of memory and CPU cycles? My array is hundreds of MB in size and code needs to run quickly on Android and iOS. – kaalus Nov 28 '16 at 13:36
  • 2
    `Array.ConvertAll(array, item => (byte)item)` is as optimal as you're going to get. This ensures that the array is iterated over only once. Let the compiler deal with the performance implications. If you cared about this kind of low level stuff, you wouldn't be writing in C#. – Cody Gray - on strike Nov 28 '16 at 13:37
  • or use `BitConverter.GetBytes(data);` – Charles Bretana Nov 28 '16 at 13:39
  • It's a shame the `Write()` method wasn't `Write(IEnumerable data)`... because then you would have more options. – Matthew Watson Nov 28 '16 at 13:44
  • "The cast should result in a byte array twice the size that points to the same memory." Sais whom? Obviously this is not what the spec. sais and thus what you want is not possible. Anyway when writing `(byte[]) data` you say `data` **is** an array of bytes, which it clearly isn´t. – MakePeaceGreatAgain Nov 28 '16 at 13:44
  • @MatthewWatson It isn't `Write(IEnumerable data)` because what it does under the hood is to pass the pointer to the start of the array data to I/O functions of the OS. If it was IEnumerable, the Write function would have to make a copy, so no difference really. – kaalus Nov 28 '16 at 13:49
  • @fafl The cast should result in an array of bytes twice the size. – kaalus Nov 28 '16 at 13:49
  • @CodyGray The performance implications of copy are unacceptable. My array is hundreds of megabytes in size and code needs to run on mobile. The data is in the memory. The Write function passes the pointer to the data to OS I/O functions. Why C# type system straitjackets me to make another copy of the data that will be bit-identical to what I already have? – kaalus Nov 28 '16 at 13:52
  • @HimBromBeere `(byte[])data` syntax is only an example. It could look like that `data.GetRawBytes()` if it makes any difference. – kaalus Nov 28 '16 at 13:54
  • 1
    Then the performance implications of using C# are unacceptable. This is how it works. And no, of course the copy will not be bit-identical to what you have. Arrays are first-class types in the CLR, and they are type-aware. It'll have a different length, for starters. This isn't C. You aren't just passing a pointer to the first element. If you wanted to write the kind of low level code that you're describing (I don't blame you; I think this way, too), then you shouldn't have chosen C#. The type system "straightjackets" you to provide safety: it is all by design. You just don't like the design. – Cody Gray - on strike Nov 28 '16 at 13:56
  • Your other option is to change the design so that the array *always* has the type `byte`, and never `short`. Then, pass the `byte` array around, but interpret it as if it were an array of shorts, just like you were hoping to get the `Write` function to do. No guarantees that this will be perfect, though. The type system will actively interfere with your desire to subvert it. – Cody Gray - on strike Nov 28 '16 at 13:58
  • How do you create the array of shorts? Could you create the byte array instead? – fafl Nov 28 '16 at 14:02
  • @CodyGray I agree fully with the type safety. In case of POD arrays though, casts do not compromise safety. There is no risk of overwriting memory or inadvertently calling virtual functions of another type. It is my feeling that the fact that such mechanism is not allowed is a big and unnecessary omission from C# - a lot more important than e.g. thousands separator in literals that C#7 cares to add. – kaalus Nov 28 '16 at 14:03
  • 1
    You are trying to make C# into a different language than it is trying to be. Arrays are not POD types. They are objects, implicitly inheriting from System.Array. They are first-class types in the CLR, and nothing in C# is going to change that. – Cody Gray - on strike Nov 28 '16 at 14:19
  • Anyway, it seems pretty likely that the IO is going to be the slow bit; converting short[] to byte[] is small potatoes in comparison. – Matthew Watson Nov 28 '16 at 14:43
  • @MatthewWatson On mobile the memory consumed is likely to be relevant, as there isn't necessarily going to be much RAM. It's not the speed of the copy, it's the memory footprint. The real problem here is that the IO isn't streaming the input data when it should be. (And that's likely to end up being a problem even if he *could* do what he wants to do, which he can't.) – Servy Nov 28 '16 at 14:49
  • My answer [here](https://stackoverflow.com/a/58937184/543814) seems to reflect what you want, except that you are limited by a `byte[]` parameter instead of a span. In case the info helps at all. (Note that you might also copy to a stackalloc'ed byte[], which is not what you are asking, but _could_ be beneficial.) – Timo Nov 19 '19 at 15:47

2 Answers2

5

I do not care about endianness. I want memory representation of shorts written to a file in whatever the machine representation of short is.

This is the first impossible thing - endianness changes the memory representation, so reading from successive byte addresses starting at the address of the first short in the array will result in different byte patterns depending on the machine endianness.

The second impossible thing is that arrays in the CLR have type and length information encoded with the data. You cannot change this header information, or else you would break the garbage collector. So given a short[] array, you cannot convert it to a byte[] array. You might get to a byte pointer using C++ clr or unsafe code, but you still won't get to a CLR array.

If you really cannot control the code which takes the byte array, you might be able to change the code manipulating the shorts. Using a MemoryStream over the byte array would allow you to read and write data to it, you could wrap the array as an IList<short>, or you could just create accessor extension functions to get the data as shorts.

public sealed class ShortList :IList<short>
{
    private readonly byte[] _array;

    public short this[int index]
    {
        get { return (short)_array[index/2]<<8 | _array[index/2+1] ; }
    }

    public int Count
    {
        get { return _array.Length/2; }
    }

    ... many more methods in IList
Pete Kirkham
  • 48,893
  • 5
  • 92
  • 171
  • Marking this as answer. I didn't realize array length and type are stored along with the data in CLR. If so, it's impossible in C# or any CLR-based language to have 2 Array objects of different lengths and types pointing to the same data. – kaalus Nov 29 '16 at 11:59
  • "I do not care about endianness" means that I do not care in what order the bytes will be written. The data will only ever be read by the same device that wrote it. Plus I actually know all my platforms are little endian anyway. – kaalus Nov 29 '16 at 12:01
  • @kaalus `it's impossible in C# or any CLR-based language to have 2 Array objects of different lengths and types pointing to the same data.` It's not, actually. .NET supports array covariance, it just doesn't support it for value types, only reference types. Well, they'd be the same length; treating one reference type array as another reference type would never result in it having a different length, but the (compile time) types would be different. – Servy Nov 30 '16 at 14:06
-3

What about

Write(data.SelectMany(x => BitConverter.GetBytes(x)).ToArray());
Maarten
  • 22,527
  • 3
  • 47
  • 68
  • 2
    This makes a copy of the array, which the OP indicated he doesn't want to do. – Maarten Nov 28 '16 at 13:55
  • 3
    Your answer makes two, if not three copies of the data. The array is hundreds of megabytes. Code runs on a mobile phone. – kaalus Nov 28 '16 at 13:56
  • @kaalus It only makes one copy of the data, no more. If you use a write operation that doesn't require all of the data to be in a materialized array, and instead use one that takes a stream or an `IEnumerable`, you can perform an operation like this with O(1) additional memory. – Servy Nov 28 '16 at 14:20
  • copy 1 - BitConverter.GetBytes - copies data from each short into its own short-lived array; copies 2 and a half - ToArray() on an IEnumerable which is not backed by an ICollection ( which is the case as SelectMany uses yield ) will grow an array in the Buffer from 4 to the required size increasing powers of two, so will copy about half the data twice on average. – Pete Kirkham Nov 30 '16 at 15:55