2

I work with a binary format that contains several magic byte sequences. I want to keep them in a static class as immutable static members.

public static class HuffmanConsts
{
    // output format: Header, serialized tree (prefix), DataDelimiter, coded data (logical blocks are 8 byte large, Little Endian)
    public const string Extension = ".huff";
    public static readonly IReadOnlyList<byte> Header = Array.AsReadOnly(new byte[] {0x7B, 0x68, 0x75, 0x7C, 0x6D, 0x7D, 0x66, 0x66}); // string {hu|m}ff
    public static readonly IReadOnlyList<byte> DataDelimiter = Array.AsReadOnly(BitConverter.GetBytes(0L)); // eight binary zeroes, regardless of endianness
}

ReadOnlyCollection<byte> (returned from Array.AsReadOnly()) prevents outside code from changing the values, unlike byte[].

But now, I cannot output Header via stream.Write(), because it requires byte[]:

stream.Write(HuffmanConsts.Header, 0, HuffmanConsts.Header.Count)

Is there an elegant way to write the Header? Or do I have to write a loop and feed the bytes into the stream one by one?

Palec
  • 12,743
  • 8
  • 69
  • 138
  • Searched on [SO], MSDN and tried queries like `stream.write readonlycollection byte` but got no relevant results. – Palec Jan 11 '16 at 17:48
  • `Stream` requires `byte[]`. Dot. You need to sacrifice either some OOP concepts or performance. The choice is yours. – Ivan Stoev Jan 11 '16 at 17:48
  • 1
    I would encapsulate the serialization/deserialization itself. Consider a class with [static] methods void WriteHeader(stream), WriteDelimiter(stream), ReadHeader(stream), ... – alexm Jan 11 '16 at 17:50

2 Answers2

6

Just Making the Output Array Immutable

You could consider something like this:

public static class HuffmanConsts {
   // output format: Header, serialized tree (prefix), DataDelimiter,
   // coded data (logical blocks are 8 byte large, Little Endian)
   public const string Extension = ".huff";

   private static readonly IReadOnlyList<byte> _header =
      // string {hu|m}ff
      Array.AsReadOnly(new byte[] {0x7B, 0x68, 0x75, 0x7C, 0x6D, 0x7D, 0x66, 0x66});
   private static readonly IReadOnlyList<byte> _dataDelimiter =
      // eight binary zeroes, regardless of endianness
      Array.AsReadOnly(BitConverter.GetBytes(0L)); 

   public static byte[] Header { get { return _header.ToArray(); } }
   public static byte[] DataDelimiter { get { return _dataDelimiter.ToArray(); } }
}

Dealing With Any Performance Implications of ToArray

The overhead of ToArray() would be incurred each time you access these properties, however. To alleviate that potential performance penalty (note: testing is in order to see if there actually is one!), you can use System.Buffer.BlockCopy:

private static readonly byte[] _header =
   // string {hu|m}ff
   new byte[] {0x7B, 0x68, 0x75, 0x7C, 0x6D, 0x7D, 0x66, 0x66};
private static int BYTE_SIZE = 1;
private static byte[] GetHeaderClone() {
   byte[] clone = new byte[_header.Length];
   Buffer.BlockCopy(_header, 0, clone, 0, _header.Length * BYTE_SIZE);
   return clone;
}

A Better Solution: Encapsulate Writing to the Stream

You could also create extension methods that let your consumers stop messing about with the details of writing these stream components themselves, for example, the WriteHeader method might look like this:

public static class StreamExtensions {
   // include BlockCopy code from above
   public static void WriteHuffmanHeader(this Stream stream) {
      var header = GetHeaderClone();
      stream.Write(header, 0, header.Length);
   }
}

This would not make the array immutable, but being private that is not an issue.

A Possibly Even Better Solution: Encapsulate a Huffman Stream Object

You also have the option of implementing your own HuffmanStream which takes care of the details of the header and other aspects for you! I actually think this is ideal as it would encapsulate all the concerns of Huffman streams into a testable bit of code that is not duplicated every place you need to work with one.

public class HuffmanStream : Stream {
   private Stream _stream = new MemoryStream();
   private static byte[] _header = ... ;
   public HuffmanStream( ... ) {
      ...
      _stream.Write(_header, 0, _header.Length)
      // the stream already has the header written at instantiation time
   }
}

Note: when passing a byte[] instance to Stream.Write(), it may be modified after the method returns as the method gets direct access to the array. Well-behaved Stream implementations don't do that, but to be safe against custom streams, you must treat Stream instances as hostile and therefore never pass them a reference to an array that shouldn't be changed. For example, any time you want to pass your _header byte array to possiblyHostileStream.Write(), you need to pass _header.Clone() instead. My HuffmanStream does not need this because it uses MemoryStream, which can be trusted.

Palec
  • 12,743
  • 8
  • 69
  • 138
ErikE
  • 48,881
  • 23
  • 151
  • 196
  • The edit looks great. I wrote my own answer with different approaches, but writing HuffmanStream wrapper around the actual stream is probably going to be the ultimate solution. [Huffman coding](https://en.wikipedia.org/wiki/Huffman_coding) requires writing bit sequences not evenly divisible into bytes, so HuffmanStream might implement Write() for my representation of these bit sequences as well. – Palec Jan 11 '16 at 19:28
  • Actually, good faith assumption is always needed with `ReadOnlyCollection`, @usr. Through a cast, you can get it from the `IReadOnlyList` interface and use its [`Items` property](https://msdn.microsoft.com/en-us/library/ms132509.aspx) and you just got behind the read-only facade. I believe that passing the underlying array to a `Stream`’s `Write` method is OK, because the stream has no reason to alter the array. And as I read [its documentation](https://msdn.microsoft.com/en-us/library/system.io.stream.write.aspx), it would actually be a breach of the contract. Correct me if I’m wrong, please. – Palec Jan 12 '16 at 08:05
  • @Palec always use `someList.AsReadOnly()` to populate an `IReadOnlyCollection` or `IReadOnlyList`. You *cannot* modify that, no good faith needed. Also,`Stream` is abstract so `class MaliciousStream : Stream` is not safe. Any MS-written subclass of `Stream` such as `FileStream` or `MemoryStream` are safe. – ErikE Jan 12 '16 at 08:12
  • Aaaah, the [`Items` property is protected](http://stackoverflow.com/q/17398428/2157640)! That's what I missed. – Palec Jan 12 '16 at 08:52
  • @Palec it might breach the contract, true. The relevant scenario here is multiple uncooperating parties living in the same .NET process. That's why the BCL is so serious about not allowing this. If one library acts up that causes terrible bugs for other components. If you control all of this code I would not bother with any of this and just expose the arrays. Or maybe use one of the 99% protection methods found in this question. – usr Jan 12 '16 at 13:47
1

You can leave the class as it is and convert the Header to byte[] for the stream:

stream.Write(HuffmanConsts.Header.ToArray(), 0, HuffmanConsts.Header.Count)

This IEnumerable.ToArray() extension method is from System.Linq.

Alternately, you can store directly the byte array and use a property to return its clones. This is a simpler variant of the first approach described by ErikE. No need for ReadOnlyCollection anymore.

public static class HuffmanConsts
{
    // output format: Header, serialized tree (prefix), DataDelimiter, coded data (logical blocks are 8 byte large, Little Endian)
    public const string Extension = ".huff";
    private static byte[] _header = new byte[] {0x7B, 0x68, 0x75, 0x7C, 0x6D, 0x7D, 0x66, 0x66}; // string {hu|m}ff
    private static byte[] _dataDelimiter = BitConverter.GetBytes(0L); // eight binary zeroes, regardless of endianity
    public byte[] Header { get { return (byte[])_header.Clone(); } }
    public byte[] DataDelimiter { get { return (byte[])_dataDelimiter.Clone(); } }
}

I am not in favor of this solution, because these properties do non-trivial amount of work (allocation; still O(1), though). Converting them into Get* methods would communicate the idea and is the way to go when publishing immutable arrays, according to the Framework Design Guidelines.


As Ivan Stoev commented under the question:

Stream requires byte[]. Dot. You need to sacrifice either some OOP concepts or performance. The choice is yours.

The reason is (I guess) that the byte array is passed directly to an underlying syscall and other collections have incompatible inner structure. Therefore, I believe, it is not possible to evade the overhead introduced by new array allocation on each call if you want to keep the current implementation of HuffmanConsts.

Community
  • 1
  • 1
Palec
  • 12,743
  • 8
  • 69
  • 138
  • Learn something new every day. I'm intermediate in my C# and just didn't have `Clone` on my radar, so thanks for that! It's better than `Buffer.BlockCopy` (which is still useful for its ability to copy portions of targets to portions of destinations, which `Clone` obviously won't do). Do you mind if I update my answer to use `Clone` (and leave in comments about the original recommendation of `BlockCopy` so your answer still makes sense)? – ErikE Jan 11 '16 at 19:56
  • Yes, it’s OK for me. – Palec Jan 12 '16 at 07:52