First of all you should know that generating random data and writing that to disk is at least an order of magnitude slower than allocating a contiguous memory for buffer. This definitely falls under the "premature optimization" category. Eliminating the creation of the buffer inside the iteration will not make your code noticeably faster.
Reusing the buffer
But to reuse the buffer, move it outside of the loop, create the biggest needed buffer, and slice it in each iteration to the needed size. It's OK to do this, because we'll overwrite the whole part we need with random data.
Note that I somewhat changed the size
generation (likely an error in your code as you always increase the generated temporary files, since you use the size
accumulated size for new ones).
Also note that writing a file with contents prepared in a []byte
is easiest done using a single call to os.WriteFile()
.
Something like this:
bigRaw := make([]byte, 1 << 32)
for totalSize := int64(0); ; {
size := rand.Int63n(1 << 32) // random dimension up to 4GB
totalSize += size
if totalSize >= temporaryFilesTotalSize {
break
}
raw := bigRaw[:size]
rand.Read(raw) // It's documented that rand.Read() always returns nil error
filePath := filepath.Join(dir, random.HexString(12))
if err := os.WriteFile(filePath, raw, 0666); err != nil {
panic(err)
}
files = append(files, filePath)
}
Solving the task without an intermediate buffer
Since you are writing big files (GBs), allocating that big buffer is not a good idea: running the app will require GBs of RAM! We could improve it with an inner loop to use smaller buffers until we write the expected size, which solves the big memory issue, but increases complexity. Luckily for us, we can solve the task without any buffers, and even with decreased complexity!
We should somehow "channel" the random data from a rand.Rand
to the file directly, something similar what io.Copy()
does. Note that rand.Rand
implements io.Reader
, and os.File
implements io.ReaderFrom
, which suggests we could simply pass a rand.Rand
to file.ReadFrom()
, and the file
itself would get the data directly from rand.Rand
that will be written.
This sounds good, but the ReadFrom()
reads data from the given reader until EOF or error. Neither will ever happen if we pass rand.Rand
. And we do know how many bytes we want to be read and written: size
.
To our "rescue" comes io.LimitReader()
: we pass an io.Reader
and a size to it, and the returned reader will supply no more than the given number of bytes, and after that will report EOF.
Note that creating our own rand.Rand
will also be faster as the source we pass to it will be created using rand.NewSource()
which returns an "unsynchronized" source (not safe for concurrent use) which in turn will be faster! The source used by the default/global rand.Rand
is synchronized (and so safe for concurrent use–but is slower).
Perfect! Let's see this in action:
r := rand.New(rand.NewSource(time.Now().Unix()))
for totalSize := int64(0); ; {
size := r.Int63n(1 << 32)
totalSize += size
if totalSize >= temporaryFilesTotalSize {
break
}
filePath := filepath.Join(dir, random.HexString(12))
file, err := os.Create(filePath)
if err != nil {
return nil, err
}
if _, err := file.ReadFrom(io.LimitReader(r, fsize)); err != nil {
panic(err)
}
if err = file.Close(); err != nil {
panic(err)
}
files = append(files, filePath)
}
Note that if os.File
would not implement io.ReaderFrom
, we could still use io.Copy()
, providing the file as the destination, and a limited reader (used above) as the source.
Final note: closing the file (or any resource) is best done using defer
, so it'll get called no matter what. Using defer
in a loop is a bit tricky though, as deferred functions run at the end of the enclosing function, and not at the end of the loop's iteration. So you may wrap it in a function. For details, see `defer` in the loop - what will be better?