2

I have to write 4GB short[] arrays to and from disk, so I have found a function to write the arrays, and I am struggling to write the code to read the array from the disk. I normally code in other languages so please forgive me if my attempt is a bit pathetic so far:

using UnityEngine;
using System.Collections;
using System.IO;

public class RWShort : MonoBehaviour {

    public static void WriteShortArray(short[] values, string path)
    {
        using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
        {
            using (BinaryWriter bw = new BinaryWriter(fs))
            {
                foreach (short value in values)
                {
                    bw.Write(value);
                }
            }
        }
    } //Above is fine, here is where I am confused: 


    public static short[] ReadShortArray(string path) 
    {
        byte[]  thisByteArray= File.ReadAllBytes(fileName);
        short[] thisShortArray= new short[thisByteArray.length/2];      
                for (int i = 0; i < 10; i+=2)
                {
                    thisShortArray[i]= ? convert from byte array;
                }


        return thisShortArray;
    }   
}
bandybabboon
  • 2,210
  • 1
  • 23
  • 33
  • You could probably just read all byte and [convert them to short](https://stackoverflow.com/questions/1104599/convert-byte-array-to-short-array-in-c-sharp) but 4gig of data is a lot! might have memory issue. – the_lotus Feb 05 '20 at 19:49
  • I've never seen this kind of variable declaration in C# before: `thisShort : short[] = new short[];` – Sam Axe Feb 05 '20 at 19:51
  • Hi, sorry I fixed that. It's some audio analysis data which takes 20-30 minutes to compute, so if i can save it to disk i can save that time to study it. it's 50*44100*600 short values. – bandybabboon Feb 05 '20 at 19:53
  • The second code will not compile since nothing is returned and nothing is stored from the Read. The code is trying to split the input into two equal pieces but if not a multiple of 4 your will have a left half and right half with odd number of bytes and reading the last short (int16) will also not work. – jdweng Feb 05 '20 at 19:53
  • I think instead of relying on the length of the array I'd use `while (fs.Position < fs.Length)`. I'd also switch to a `LinkedList` so I didn't have to allocate 4GB of contiguous memory. LinkedList keeps a pointer to the next element/item, so the memory allocation doesn't need to be contiguous. – Sam Axe Feb 05 '20 at 19:54
  • the readallbytes option sounds very reasonable if i can read 4gb in 1-2 minutes of processing. – bandybabboon Feb 05 '20 at 19:55
  • related: https://stackoverflow.com/q/3206391/103167 – Ben Voigt Feb 05 '20 at 20:21

2 Answers2

3

Shorts are two bytes, so you have to read in two bytes each time. I'd also recommend using a yield return like this so that you aren't trying to pull everything into memory in one go. Though if you need all of the shorts together that won't help you.. depends on what you're doing with it I guess.

void Main()
{
    short[] values = new short[] {
        1, 999, 200, short.MinValue, short.MaxValue
    };

    WriteShortArray(values, @"C:\temp\shorts.txt");

    foreach (var shortInfile in ReadShortArray(@"C:\temp\shorts.txt"))
    {
        Console.WriteLine(shortInfile);
    }
}

public static void WriteShortArray(short[] values, string path)
{
    using (FileStream fs = new FileStream(path, FileMode.OpenOrCreate, FileAccess.Write))
    {
        using (BinaryWriter bw = new BinaryWriter(fs))
        {
            foreach (short value in values)
            {
                bw.Write(value);
            }
        }
    }
}

public static IEnumerable<short> ReadShortArray(string path)
{
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    using (BinaryReader br = new BinaryReader(fs))
    {
        byte[] buffer = new byte[2];
        while (br.Read(buffer, 0, 2) > 0)
            yield return (short)(buffer[0]|(buffer[1]<<8)); 
    }
}

You could also define it this way, taking advantage of the BinaryReader:

public static IEnumerable<short> ReadShortArray(string path)
{
    using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read))
    using (BinaryReader br = new BinaryReader(fs))
    {
        while (br.BaseStream.Position < br.BaseStream.Length)
            yield return br.ReadInt16();
    }
}
Michael Jones
  • 1,900
  • 5
  • 12
  • hey thanks! I'm very grateful indeed. I can see that I will be able to implement it, i'd be very suprised if i can't run it now. It's for complex music instrument identification, am writing some kind of lab experiements, the maths is easier than the data type conversions and memory management! – bandybabboon Feb 05 '20 at 20:19
3

Memory-mapping the file is your friend, there's a MemoryMappedViewAccessor.ReadInt16 function that will allow you to directly read the data, with type short, out of the OS disk cache. Also a Write() overload that accepts an Int16. Also ReadArray and WriteArray functions if you are calling functions that need a traditional .NET array.

Overview of using Memory-mapped files in .NET on MSDN

If you want to do it with ordinary file I/O, use a block size of 1 or 2 megabytes and the Buffer.BlockCopy function to move data en masse between byte[] and short[], and use the FileStream functions that accept a byte[]. Forget about BinaryWriter or BinaryReader, forget about doing 2 bytes at a time.

It's also possible to do the I/O directly into a .NET array with the help of p/invoke, see my answer using ReadFile and passing the FileStream object's SafeFileHandle property here But even though this has no extra copies, it still shouldn't keep up with the memory-mapped ReadArray and WriteArray calls.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720