1

I want to store an sbyte[,] on my disk using as little space as possible (cant take more than a few seconds to save or load though) and get it back at a later time.

I can't serialize it to xml: Cannot serialize object of type System.SByte[,]. Multidimensional arrays are not supported.

And I can't convert it to a MemoryStream: Cannot convert from 'sbyte[,]' to 'int'

Besides creating a text file and looping it out piece by piece.. what options are there.

If it makes any difference the array can be upwards of 100 000 x 100 000 in size. The file also needs to be usable by different operating systems and computers.


Update.

I went with flattening my array down to a 1D sbyte[] and then converted the sbyte[] to a stream and saving it to disk along with a separate file containing the dimensions.

Stream stream = new MemoryStream(byteArray);

Used this as a base for saving the stream to disk. https://stackoverflow.com/a/5515894/937131

This is a testcase I wrote for the flattening and unflattening if anyone else finds it usefull.

[TestMethod]
public void sbyteTo1dThenBack()
{
    sbyte[,] start = new sbyte[,]
    {
        {1, 2},
        {3, 4},
        {5, 6},
        {7, 8},
        {9, 10}
    };

    sbyte[] flattened = new sbyte[start.Length];
    System.Buffer.BlockCopy(start, 0, flattened, 0, start.Length * sizeof(sbyte));

    sbyte[,] andBackAgain = new sbyte[5, 2];
    Buffer.BlockCopy(flattened, 0, andBackAgain, 0, flattened.Length * sizeof(sbyte));

    var equal =
        start.Rank == andBackAgain.Rank &&
        Enumerable.Range(0, start.Rank).All(dimension => start.GetLength(dimension) == andBackAgain.GetLength(dimension)) &&
        andBackAgain.Cast<sbyte>().SequenceEqual(andBackAgain.Cast<sbyte>());

    Assert.IsTrue(equal);
}
Community
  • 1
  • 1
JensB
  • 6,663
  • 2
  • 55
  • 94
  • Possible duplicate of [How to save/restore serializable object to/from file?](http://stackoverflow.com/questions/6115721/how-to-save-restore-serializable-object-to-from-file) – Cᴏʀʏ Jan 13 '16 at 20:55
  • 100,000 x 100,000 in size... Can't take more than a few seconds to load or save... Hmmm... Pick one? – Luke Joshua Park Jan 13 '16 at 20:56
  • well it can be upwards of 100k by 100k, so whatever speed I can get will be good.. most of the time it will probably be around 50k by 40k – JensB Jan 13 '16 at 20:58
  • 1
    Nevermind, scratch the .NET binary serialization -- if you need a format that's usable across OSes and computers, you will need to use something more generic like XML or JSON. – Cᴏʀʏ Jan 13 '16 at 20:58
  • 1
    Is there a consistent size for the 2D array? If not, things become a bit more complicated, but, if so, you should be able to flatten it to a 1D byte array, store it as a binary file on the disk (via `File.WriteAllBytes()`), then reconstitute it as a 2D `SByte` array. The exact mechanisms for doing so will be up to the reader to determine. – willaien Jan 13 '16 at 20:58
  • 2
    If there isn't a consistent size for the 2D byte array, I'd suggest creating a metadata object with the information, serializing that separately, then using it to determine how to reconstitute the `SByte` array. – willaien Jan 13 '16 at 20:59
  • 1
    The latest versions of JSON.NET can serialize multi-dimensional arrays, and it's probably pretty fast. – Cᴏʀʏ Jan 13 '16 at 21:01
  • 1
    I'd also like to note that a 100Kx100K 2D `SByte` array is, at a minimum, about 9GB worth of raw data. No matter how you slice it, it's going to be expensive to load up (You'll probably need to stream the data in row by row and load it into the resulting 2D array) – willaien Jan 13 '16 at 21:02
  • I will test json.net and the 2D flattening idea and see which goes faster. Thank you. As for the 9Gb, the computer reading it in has a minimum 32 Gb ram. – JensB Jan 13 '16 at 21:05
  • 1
    It's not the amount of data in RAM, it's the fact that streaming the data to and from disk will never be measured in seconds, unless you have some really fast disk arrays (IE SSDs in RAID 10, or really fast SSDs). Barring compression (which is another subject, but might be worth thinking about in this case), it will be at a minimum 9.3GB to store a 100kX100k array, which will take 19 seconds to read from disk at 500MB/s – willaien Jan 13 '16 at 21:07
  • @willaien duly noted ;) – JensB Jan 13 '16 at 21:08
  • Why use XML or JSON for binary data, especially when speed and size are concerns? It'll take more time and increases the amount of data you need to write. Without more information about what data we're talking about here I'd just store the width and heigth and the actual bytes themselves in a binary file, perhaps using some form of compression. – Pieter Witvoet Jan 13 '16 at 21:29
  • @PieterWitvoet to my understanding Binary formatters dont like to work cross platform? – JensB Jan 13 '16 at 21:38
  • It's already a (s)byte array - no binary formatter required. Flatten it, as willaien said, and just write it to a file. When storing the width and height you do need to keep byte-order in mind, but as long as you always use the same order that shouldn't be a problem. – Pieter Witvoet Jan 13 '16 at 21:59
  • Thanks for all the help, got it to work with flattening and saving that to a stream. I actually used https://dotnetzip.codeplex.com/zip to store it compressed so it dosen't use such crazy amounts of disk too. – JensB Jan 15 '16 at 14:26

1 Answers1

4

As per my comments, I feel that writing out the byte array equivalents of everything is the way to go here. This may not be the most efficient way to do it, and lacks a lot of error handling code that you will need to supply, but, it works in my tests.

Edit: Also, BitConverter.ToInt32() may depend on the "Endianness" of your processor. See Scott Chamberlain's comments on how to fix this if you intend to use this code on ARM or other non-x86 systems.

public static class ArraySerializer
{
    public static void SaveToDisk(string path, SByte[,] input)
    {
        var length = input.GetLength(1);
        var height = input.GetLength(0);
        using (var fileStream = File.OpenWrite(path))
        {
            fileStream.Write(BitConverter.GetBytes(length), 0, 4);//Store the length
            fileStream.Write(BitConverter.GetBytes(height), 0, 4);//Store the height
            var lineBuffer = new byte[length];
            for (int h = 0; h < height; h++) 
            {
                for (int l = 0; l < length; l++) 
                {
                    unchecked //Preserve sign bit
                    {
                        lineBuffer[l] = (byte)input[h,l];
                    }
                }
                fileStream.Write(lineBuffer,0,length);
            }

        }
    }
    public static SByte[,] ReadFromDisk(string path)
    {
        using (var fileStream = File.OpenRead(path))
        {
            int length;
            int height;
            var intBuffer = new byte[4];
            fileStream.Read(intBuffer, 0, 4);
            length = BitConverter.ToInt32(intBuffer, 0);
            fileStream.Read(intBuffer, 0, 4);
            height = BitConverter.ToInt32(intBuffer, 0);
            var output = new SByte[height, length]; //Note, for large allocations, this can fail... Would fail regardless of how you read it back
            var lineBuffer = new byte[length];
            for (int h = 0; h < height; h++)
            {
                fileStream.Read(lineBuffer, 0, length);
                for (int l = 0; l < length; l++)
                    unchecked //Preserve sign bit
                    {
                        output[h,l] = (SByte)lineBuffer[l];
                    }
            }
            return output;
        }
    }
}

Here's how I tested it:

void Main()
{
    var test = new SByte[20000, 25000];
    var length = test.GetLength(1);
    var height = test.GetLength(0);
    var lineBuffer = new byte[length];
    var random = new Random();
    //Populate with random data
    for (int h = 0; h < height; h++) 
    {
        random.NextBytes(lineBuffer);
        for (int l = 0; l < length; l++)
        {
            unchecked //Let's use first bit as a sign bit for SByte
            {
                test[h,l] = (SByte)lineBuffer[l];
            }
        }
    }
    var sw = Stopwatch.StartNew();
    ArraySerializer.SaveToDisk(@"c:\users\ed\desktop\test.bin", test);
    Console.WriteLine(sw.Elapsed);
    sw.Restart();
    var test2 = ArraySerializer.ReadFromDisk(@"c:\users\ed\desktop\test.bin");
    Console.WriteLine(sw.Elapsed);
    Console.WriteLine(test.GetLength(0) == test2.GetLength(0));
    Console.WriteLine(test.GetLength(1) == test2.GetLength(1));
    Console.WriteLine(Enumerable.Cast<SByte>(test).SequenceEqual(Enumerable.Cast<SByte>(test2))); //Dirty hack to compare contents... takes a very long time
}

On my system (with an SSD), that test takes ~2.7s to write or read the contents of the 20kx25k array. To add compression, you can just wrap the FileStream in a GZipStream.

willaien
  • 2,647
  • 15
  • 24
  • I would not make `lineBuffer` have `length` for it's size, that is going to require huge chunks of memory. I would just make it 4096 and do multiple write calls per "row". Also be careful with `BitConverter.GetBytes` it is processor endianness specific, if the file is going to be transferred from machine to machine check `if (!BitConverter.IsLittleEndian)` and do a `Array.Reverse(` on the output of the `BitConverter.GetBytes` if you find your self on a Big Endian system. – Scott Chamberlain Jan 13 '16 at 22:18
  • @ScottChamberlain: Understood, but, this was a quick "Proof of Concept". Doing that will require math to ensure that the trailing remainder is handled properly, and is certainly an enhancement that JensB is welcome to try to implement. I will add a warning about handling Endianness. – willaien Jan 13 '16 at 22:20
  • The reason I did not post an answer was I was trying to do that math still when you posted your answer :), Your answer is good enough so I stopped and just upvoted you instead. – Scott Chamberlain Jan 13 '16 at 22:21