0

The output should be a large text file, where each line has the form Number.String, text is random:

347. Bus
20175. Yes Yes
15. The same
2. Hello world
178. Tree

The file size must be specified in bytes. Interested in the fastest way to generate files of about 1000MB and more.

There is my code for generation random text:

public string[] GetRandomTextWithIndexes(int size)
    {
        var result = new string[size];

        var sw = Stopwatch.StartNew();
        var indexes = Enumerable.Range(0, size).AsParallel().OrderBy(g => GenerateRandomNumber(0, 5)).ToList();
        sw.Stop();
        Console.WriteLine("Queue fill: " + sw.Elapsed);

        sw = Stopwatch.StartNew();
        Parallel.For(0, size, i =>
        {
            var text = GetRandomText(GenerateRandomNumber(1, 20));
            result[i] = $"{indexes[i]}. {text}";
        });

        sw.Stop();
        Console.WriteLine("Text fill: " + sw.Elapsed);

        return result;
    }

public string GetRandomText(int size)
    {
        var builder = new StringBuilder();

        for (var i = 0; i < size; i++)
        {
            var character = LegalCharacters[GenerateRandomNumber(0, LegalCharacters.Length)];
            builder.Append(character);
        }

        return builder.ToString();
    }

private int GenerateRandomNumber(int min, int max)
    {
        lock (_synlock)
        {
            if (_random == null)
                _random = new Random();
            return _random.Next(min, max);
        }
    }

I don't know how to make working this code not with size of strings but with size of MBs. When I set size to about 1000000000 I receive OutOfMemoryException. And maybe there is some faster way to generate indexes

Daniel B
  • 3,109
  • 2
  • 33
  • 42
Artem Kyba
  • 855
  • 1
  • 11
  • 30
  • 3
    I'd suggest writing to the file as you go rather than building the whole thing as a string in memory. Getting to the OutOfMemoryException more rapidly would seem to solve only part of the problem. – 15ee8f99-57ff-4f92-890c-b56153 Jun 01 '18 at 17:29
  • 3
    Unless you're an expert in writing slow code the actual generation of the data to output to the file will be dwarfed by the time it takes to write the data to the file. I wouldn't worry too much about "optimizing" the generation part. Oh, and you shouldn't generate the whole file in memory, you should open the streamwriter and write to it as you generate data. The operating system is quite good at handling buffers and caches on your behalf. – Lasse V. Karlsen Jun 01 '18 at 17:30
  • Would it be faster to generate a sort of zip bomb? – Marker Jun 01 '18 at 17:42
  • Can you put full exception – Daniel B Jun 01 '18 at 18:19

2 Answers2

6
  1. Disk is your bottleneck, no need for parallel processing
  2. No need to store everything in memory before writing

using (var fs = File.OpenWrite(@"c:\w\test.txt"))
using (var w = new StreamWriter(fs))
{
    for (var i = 0; i < size; i++)
    {
        var text = GetRandomText(GenerateRandomNumber(1, 20));
        var number = GenerateRandomNumber(0, 5);
        var line = $"{number}. {text}";
        w.WriteLine(line);
    }
}
Pavel Tupitsyn
  • 8,393
  • 3
  • 22
  • 44
0

It's better to put the full exception in the question. I bet it shows at

var result = new string[size];

1000000000 for size of string array is too much, try to run this dotnetfiddle, you'll get:

Run-time exception (line 12): Array dimensions exceeded supported range. Stack Trace: [System.OutOfMemoryException: Array dimensions exceeded supported range.] at Program.Main() :line 12

Please have a look at the following to know why you are getting that exception and what's the workaround.

What is the Maximum Size that an Array can hold?

Can't create huge arrays

Error when Dictionary count is bigger as 89478457

Daniel B
  • 3,109
  • 2
  • 33
  • 42