0

I've made a neural network and now I need to save the results of the training process into a local file. In total, there are 7,155,264 values. I've tried with a loop like this

string weightsString = "";
string biasesString = "";

for (int l = 1; l < layers.Length; l++)
{
    for (int j = 0; j < layers[l].Length; j++)
    {
        for (int k = 0; k < layers[l - 1].Length; k++)
        {
            weightsString += weights[l][j, k] + "\n";
        }

        biasesString += biases[l][j] + "\n";
    }
}

File.WriteAllText(@"path", weightsString + "\n" + biasesString);

But it literally takes forever to go through all of the values. Is there no way to write the contents directly without having to write them in a string first?

(Weights is a double[][,] while biases is a double[][])

  • 3
    Whoa, for one, use StringBuilder. Or go straight to IO. Currently you're thrashing the GC with tons of useless new strings on every concat. – Zer0 May 06 '22 at 12:07
  • This sounds like a job for `StreamWriter` on top of a `FileStream`, writing individual lines, i.e. `writer.WriteLine(weights[l][j,k])`, and job done, no? Note: `File.CreateText(path)` will give you exactly this setup... – Marc Gravell May 06 '22 at 12:11
  • As Zer0 already said write directly to the file, so this answer: https://stackoverflow.com/a/7569993/2598770 (not the accepted answer of the question) – Rand Random May 06 '22 at 12:12
  • 1
    Using string concatenation is really going to slow things down. For each concat you create a string for the garbage collector to clean up and with lot's of them that is going to happen very often. As @Zer0 pointed out, use a `new StringBuilder()` or write the output directly into a file without string concatenations. – Paul Sinnema May 06 '22 at 12:13
  • Thanks everyone, StringBuilder is pretty much instantaneous – Elia Giaccardi Old May 06 '22 at 12:14
  • 3
    I wouldn't recommend stringbuilder, sure it is better than what you are doing now, but why hold that data in memory instead of writing directly to the file. – Rand Random May 06 '22 at 12:16
  • @RandRandom and how do I do that? – Elia Giaccardi Old May 06 '22 at 12:18
  • 2
    How will you be consuming this file? Is it intended to be human-readable (in which case, you will need to use strings)? If, however, it's intended to be read by code then you'll be better off writing the doubles in binary form. (I can't imagine any human being happy about having to read seven million numbers...) – Matthew Watson May 06 '22 at 12:38

4 Answers4

3

First of writing down 7 million datasets will obviously take a lot of time. I'd suggest you split up weights and biases into two files and write them on the fly, no need to store them all in memory until you are done.

using StreamWriter weigthStream = new("weigths.txt", append: true);
using StreamWriter biasStream = new("biases.txt", append: true);

for (int l = 1; l < layers.Length; l++)
{
    for (int j = 0; j < layers[l].Length; j++)
    {
        for (int k = 0; k < layers[l - 1].Length; k++)
        {
            await weightStream.WriteLineAsync(weights[l][j, k]);
        }

        await biasStream.WriteLineAsync(biases[l][j]);
    }
}
Christian O.
  • 514
  • 1
  • 5
  • 20
1

But it literally takes forever to go through all of the values. Is there no way to write the contents directly without having to write them in a string first?

One option would be to save it as binary data. This makes it much harder to read by humans, but for large amount of data this would really be preferable since it will save a lot of time both when reading and writing. For example using BinaryWriter and using unsafe code.

myBinaryWriter.Write(myArray.GetLength(0));
myBinaryWriter.Write(myArray.GetLength(1));
fixed (double* ptr = myArray)
{
    var span = new ReadOnlySpan<byte>(ptr, myArray.GetLength(0) *myArray.GetLength(1) * 8);
    myBinaryWriter.Write(span);
}

You might also consider using a binary serialization library like protbuf.net that can just take a object an and serialize it to a stream. Note that some libraries may need attributes to be added to classes and properties. Some libraries may also have issues with multidimensional and/or jagged arrays. Because of this it can sometimes be useful to define your own 2D array that uses a 1D array as the backing storage, this can make things like serialization or passing data to other components much simpler.

Another somewhat common practice is to store metadata, like height, width, etc in a simple human readable text-file using something like json or xml. While keeping the actual data in a separate raw binary file.

JonasH
  • 28,608
  • 2
  • 10
  • 23
-1
  1. Bad variant - you can use json serialization

  2. So-so variant - write in file immediately. Use File.AppendText

  3. IMHO the best variant - use DB

  4. IMHO good variant - use BinaryFormatter (you will not be able to read that by yourself, but application will)

  5. Working variant - use StringBuilder

  • _"IMHO good variant - use BinaryFormatter"_ - Hmmm. Docs say: _"Warning - BinaryFormatter is insecure and can't be made secure. For more information, see the BinaryFormatter security guide."_ , among other downsides... – Fildor May 06 '22 at 12:27
-2
StringBuilder weightsSB = new StringBuilder();
StringBuilder biasesSB = new StringBuilder();

for (int l = 1; l < layers.Length; l++)
{
    for (int j = 0; j < layers[l].Length; j++)
    {
        for (int k = 0; k < layers[l - 1].Length; k++)
        {
            weightsSB.Append(weights[l][j, k] + "\n");
        }

        biasesSB.Append(biases[l][j] + "\n");
    }
}

As suggested in the comments, I used a StringBuilder instead. Works like a charm.

  • 2
    I'd still suggest storing the data immediatly as it scales better. – Christian O. May 06 '22 at 12:28
  • 2
    Using a StringBuilder is going to clog memory and limit the amount that can be written. I agree with @ChristianO to write directly to disk is the better more stable way to this. – Paul Sinnema May 06 '22 at 12:36