32

I am creating a file of a specified size - I don't care what data is in it, although random would be nice. Currently I am doing this:

        var sizeInMB = 3; // Up to many Gb
        using (FileStream stream = new FileStream(fileName, FileMode.Create))
        {
            using (BinaryWriter writer = new BinaryWriter(stream))
            {
                while (writer.BaseStream.Length <= sizeInMB * 1000000)
                {
                    writer.Write("a"); //This could be random. Also, larger strings improve performance obviously
                }
                writer.Close();
            }
        }

This isn't efficient or even the right way to go about it. Any higher performance solutions?

Thanks for all the answers.

Edit

Ran some tests on the following methods for a 2Gb File (time in ms):

Method 1: Jon Skeet

byte[] data = new byte[sizeInMb * 1024 * 1024];
Random rng = new Random();
rng.NextBytes(data);
File.WriteAllBytes(fileName, data);

N/A - Out of Memory Exception for 2Gb File

Method 2: Jon Skeet

byte[] data = new byte[8192];
Random rng = new Random();
using (FileStream stream = File.OpenWrite(fileName))
{
    for (int i = 0; i < sizeInMB * 128; i++)
    {
         rng.NextBytes(data);
         stream.Write(data, 0, data.Length);
    }
}

@1K - 45,868, 23,283, 23,346

@128K - 24,877, 20,585, 20,716

@8Kb - 30,426, 22,936, 22,936

Method 3 - Hans Passant (Super Fast but data isn't random)

using (var fs = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None))
{
    fs.SetLength(sizeInMB * 1024 * 1024);
}

257, 287, 3, 3, 2, 3 etc.

Jason
  • 11,435
  • 24
  • 77
  • 131

5 Answers5

47

Well, a very simple solution:

byte[] data = new byte[sizeInMb * 1024 * 1024];
Random rng = new Random();
rng.NextBytes(data);
File.WriteAllBytes(fileName, data);

A slightly more memory efficient version :)

// Note: block size must be a factor of 1MB to avoid rounding errors :)
const int blockSize = 1024 * 8;
const int blocksPerMb = (1024 * 1024) / blockSize;
byte[] data = new byte[blockSize];
Random rng = new Random();
using (FileStream stream = File.OpenWrite(fileName))
{
    // There 
    for (int i = 0; i < sizeInMb * blocksPerMb; i++)
    {
        rng.NextBytes(data);
        stream.Write(data, 0, data.Length);
    }
}

However, if you do this several times in very quick succession creating a new instance of Random each time, you may get duplicate data. See my article on randomness for more information - you could avoid this using System.Security.Cryptography.RandomNumberGenerator... or by reusing the same instance of Random multiple times - with the caveat that it's not thread-safe.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I'd go for a 128k block size, which tends to provide superior performance in most I/O tests. 4k at a minimum, since that's the page size on 32-bit Windows OS. – Ben Voigt Dec 13 '10 at 19:35
  • 2
    @Ben: I would try to avoid using 128K as that will go on the large object heap. I'll up it to 8K though :) – Jon Skeet Dec 13 '10 at 19:45
  • Someone downvoted the question too. Think it was a down-vote rampage. – Jason Dec 14 '10 at 15:01
17

There's no faster way then taking advantage of the sparse file support built into NTFS, the file system for Windows used on hard disks. This code create a one gigabyte file in a fraction of a second:

using System;
using System.IO;

class Program {
    static void Main(string[] args) {
        using (var fs = new FileStream(@"c:\temp\onegigabyte.bin", FileMode.Create, FileAccess.Write, FileShare.None)) {
            fs.SetLength(1024 * 1024 * 1024);
        }
    }
}

When read, the file contains only zeros.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • Don't you have to explicitly enable sparseness when creating a file? – Ben Voigt Dec 13 '10 at 19:36
  • Nope, but I did run reflector on the source code. No sign of [`DeviceIoControl(FSCTL_SET_SPARSE)`](http://msdn.microsoft.com/en-us/library/aa364596.aspx) anywhere in FileStream. Are you sure the "in a fraction of a second" isn't write caching at work? – Ben Voigt Dec 13 '10 at 19:53
  • And now I did try it... the file properties show that "Size on Disk" == "Total Size", which would not be the case for a sparse file. – Ben Voigt Dec 13 '10 at 20:02
  • Make it larger than the size of your file system cache. – Hans Passant Dec 13 '10 at 20:04
  • @Hans: I just created a 26GB file using your code plus the obvious changes. The free space on my drive immediately dropped by 26GB. Empty clusters may not need to take up space in the cache, but they sure are allocated in the volume bitmap. With a sparse file they would not be. – Ben Voigt Dec 13 '10 at 20:08
  • Well, a file system would be wise to consider the space 'reserved'. But that's hardly the point of this question, *how long did it take*? – Hans Passant Dec 13 '10 at 20:12
  • @Hans: The code does what was requested. But the accompanying comments about NTFS sparse file support are, to be blunt, pure rubbish. Or else you've got another piece of code that does use sparse file support and is at least as fast, that you're not sharing. – Ben Voigt Dec 13 '10 at 20:21
0

You can use this following class created by me for generate random strings

using System;
using System.Text;

public class RandomStringGenerator
{
    readonly Random random;

    public RandomStringGenerator()
    {
        random = new Random();
    }
    public string Generate(int length)
    {
        if (length < 0)
        {
            throw new ArgumentOutOfRangeException("length");
        }
        var stringBuilder = new StringBuilder();

        for (int i = 0; i < length; i++)
        {
            char ch = (char)random.Next(0,255 );
            stringBuilder.Append(ch);
        }

        return stringBuilder.ToString();

    }

}

for using

 int length = 10;
        string randomString = randomStringGenerator.Generate(length);
Sergey K
  • 4,071
  • 2
  • 23
  • 34
  • -1 This will also be quite slow, as it is specific to in-memory strings, and does not optimize for the OP's case of writing the data directly to a file. There is no need to work with chars (which are twice as large as bytes), nor retain the entire string of bytes in memory. – cdhowie Dec 13 '10 at 18:38
  • +1 to compensate downvote. Solution is not optimal but better than nothing, so downvote is not justified. – Nicolas Raoul Oct 25 '12 at 05:48
0

The efficient way to create a large file:

    FileStream fs = new FileStream(@"C:\temp\out.dat", FileMode.Create);
    fs.Seek(1024 * 6, SeekOrigin.Begin);
    System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
    fs.Write(encoding.GetBytes("test"), 0, 4);
    fs.Close();

However this file will be empty (except for the "test" at the end). Not clear what is it exactly you are trying to do -- large file with data, or just large file. You can modify this to sparsely write some data in the file too, but without filling it up completely. If you do want the entire file filled with random data, then the only way I can think of is using Random bytes from Jon above.

MK.
  • 33,605
  • 18
  • 74
  • 111
-1

An improvement would be to fill a buffer of the desired size with the data and flushing it all at once.

Mircea Nistor
  • 3,145
  • 1
  • 26
  • 32