1

I have voxel files of 1billion voxels, every voxel is true/false and is kept in a 1D boolean array.

What is a good way to copy it to disk, for example as bytes/ a 0100010101 ASCII file, where i can read the file back into memory fast and efficiently?

At the moment i can write files to disk using:

savePath = System.IO.Directory.GetParent(Application.dataPath).ToString()+ "/Saved_Files" ;
var sw   : System.IO.StreamWriter;

I don't know the best way to read and write 1-2gb files.

This is what i wrote for the moment:

function saveBW(){
    //var SW2   : System.IO.StreamWriter;
    var timeString =  DateTime.Now.ToString("HH-mm");   
    var fileNameFromFolder= Path.GetFileNameWithoutExtension(QPath[QDone]);
    fileNameFromFolder = stripTrailingSlash(fileNameFromFolder);

    PLYname = "MK5_aliased_" + fileNameFromFolder + "_"+ timeString + ".Bo0L" ;
    var str ="";
    var SW2   = new System.IO.StreamWriter(savePath + "/" + PLYname);

    for( var  tr = 0 ; tr < mesher.supernormous.Length ; tr++ ) 
    {   
        str +=  mesher.supernormous ? 1 : 0;
        if(tr%255==0)SW2.Write(str);
    }

    SW2.Write(str);
    SW2.Flush();
    SW2.Close();
}

enter image description here

bandybabboon
  • 2,210
  • 1
  • 23
  • 33
  • 2
    You could just group 8 bits together and write them as Bytes – Manfred Radlwimmer Mar 09 '17 at 14:04
  • i'm mostly confused about how to write the simplest parser and to segment the read, to find where i am in the read process. – bandybabboon Mar 09 '17 at 14:06
  • 3
    Don't reinvent the wheel. https://msdn.microsoft.com/en-us/library/system.io.binarywriter(v=vs.110).aspx – hyankov Mar 09 '17 at 14:06
  • I haven't done this before, so this is mostly theoretical. But I imagine you could take 8 booleans at a time and turn them into a `byte`. Basically using math, applying a bitmask to that `byte` for each boolean value. (I haven't done it, so I don't know what that math specifically looks like... but for any given set of 8 bits you could add numeric values corresponding to that bit's position. 2, 4, 8, 16, etc.) Writing that stream of bytes to a file would be 1/8 the size of writing the boolean values to the file. – David Mar 09 '17 at 14:09
  • @comprehensible Without knowing what your data looks like (the actual code, not just a vague description) it's hard to tell. – Manfred Radlwimmer Mar 09 '17 at 14:10
  • ok i added the current code. – bandybabboon Mar 09 '17 at 14:17

1 Answers1

5

Booleans aren't bit-sized in .NET, so they aren't a good storage for the kind of data you want. Instead, use a BitArray - it still gives you all the manipulation you need (read a single bit value, write a single bit value), and allows you to load and store the whole array in byte[] (up to eight bits per byte). This makes persistence quite easy:

var data = new BitArray(File.ReadAllBytes("MyFile.bin"));

Of course, how efficient this really is is up to profiling. And it might be that you don't want to load the data unless it's actually required, so some sort of a paging solution might be better; but that's beyond the scope of your question as it is.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • I have unlimited file space, and I have to do about 100 read operations on every boolean, which is stored in 8bit pieces of memory, faster to read than bits, would it be easy to convert to and from bits and boolean to load/save? i m not at all a programmer im an engineer, i learnt programming at home. – bandybabboon Mar 09 '17 at 14:16
  • 1
    @comprehensible Of course; `BitArray` also has a constructor that takes `bool[]`, and the `CopyTo` method works with both `bool[]` and `byte[]`. It may be worth it to read and write the bytes manually if the overhead is significant in your scenario - it really isn't hard, just simple math and getting the edge cases right. Think of it as a jagged array, where each `byte` corresponds to a `bool[]` with 8 elements, and accessing an element in the byte is done as `byteVal & (2 << index) > 0` (read) and `byteVal |= (2 << index) * (boolVal ? 1 : 0)` (write - only works if you don't reuse the byte). – Luaan Mar 09 '17 at 14:24
  • 2
    @comprehensible This allows you to keep the file small, while having the in-memory representation fast. Though I'd still profile the thing and make sure it's actually worth it to keep the bits in a `bool[]` - assumptions aren't a good way to decide that nowadays :) Manipulating a bool can be faster/cheaper, but at the same time, you now need 8 GiB of memory instead of 1 GiB; things like this are often decided by memory access patterns, which are quite tricky to properly analyze. Is the access random or sequential? Is there a better organization than a 1D array, e.g. spatial subdivision? – Luaan Mar 09 '17 at 14:25
  • @comprehensible, even if you load bytes [addressing individual bit](http://stackoverflow.com/q/4854207/1997232) is not complicated and your data seems huge (memory-wise), so it makes perfect sense to follow Luann's recommendation. – Sinatr Mar 09 '17 at 14:26
  • Thanks you have impressed upon me the complexity of the task. with lots of thought it makes sense to use sbyte instead of bool in some 3D applications because it is also 8 bits and has 255 values which can be used to keep masses of info about the voxels. to access 3D info from a 1D array you can multiply the xyz values: x*w*h+y*w+x, I couldnt figure out how to do a 3D boolean array, perhaps it's possible to do a 3D sbyte array. Thanks for the code that's very cool i'll be running that for the moment. – bandybabboon Mar 09 '17 at 19:29