0

I want to be able to generate a random data file in Windows using either CMD or PowerShell, with the generated data being comprised of lines and lines of random text. I've managed to achieve this in PowerShell using the following command, however this took around 1 minute to generate 1MB of data, which is way too slow to be generating GBs:

1..100000 | % { [System.Web.Security.Membership]::GeneratePassword(70, 3) >> C:/dummy.txt }

File output should be the following:

YyS@ZRU98udC3q#R@5o7AR$*Bh44v22J!ekKSpIAgLQyp^pbBx
s8Wm589aYYH39@Arb2^ZRMPjx2UaEwHYkMmhgFaU$QyAU@@@WU
yB^!qo6e4x*eFvx%ZY7738&&FkhHXU24OCJCxfyQ7a%peo!$ap
...........
...........
$GVhMrkZfJbIkgAgri0w9lFVt6a^vXh6ev&jwPHGfoE!pVW85r

Does anyone have any suggestions? Preferably I can do this without the need for external tools, as the script to create this data will be run automatically during startup of the machine.

Laurie
  • 3
  • 3
  • Can there be duplicate lines? – Doug Maurer Oct 24 '20 at 20:28
  • 3
    What is "[way] too slow"? 1 minute, 1 hour? – Scratte Oct 24 '20 at 21:35
  • Do you want truely random or pseudo random? Can there be random words or other strings or must it be random characters? I MEAN I have a 6-dice PW generator that let's you choose pseudo random or truely random using quantum number generation. – Ben Personick Oct 25 '20 at 00:43
  • @DougMaurer preferably randomized for database purposes later on in the pipeline. Ben Personick either would be fine as long as its not the same string being repeated over and over. – Laurie Oct 25 '20 at 13:53
  • @Scrattle apologies, way too slow was around 1 minute per MB, I've updated my answer. – Laurie Oct 25 '20 at 13:53

2 Answers2

2

The reason your current approach is slow is because of overhead from opening, writing to and closing the same file 100K times. Move the file redirection operator outside the pipeline expression and to only incur said overhead once:

1..100000 | % { [System.Web.Security.Membership]::GeneratePassword(70, 3) } >> .\dummy.txt

To illustrate the difference, here's measurements from 1000 passwords with your original vs. moving the output redirection outside:

PS ~>
>> Measure-Command {
>>   1..1000 | % { [System.Web.Security.Membership]::GeneratePassword(70, 3) >> .\dummy.txt }
>> } |Select TotalMilliseconds

TotalMilliseconds
-----------------
        8881.9736


PS ~>
>> Measure-Command {
>>   1..1000 | % { [System.Web.Security.Membership]::GeneratePassword(70, 3) } >> .\dummy.txt
>> } |Select TotalMilliseconds

TotalMilliseconds
-----------------
          72.7485

Literally >100 times faster already at only 1000 lines

Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Nice catch, easily overlooked (by me!) – Doug Maurer Oct 25 '20 at 14:10
  • I can't believe how much faster this approach is, just from that change. I've gone from around 1MB in one minute to 400MB. Thanks for the help, this is exactly what I was looking for. – Laurie Oct 25 '20 at 14:17
1

This should get you underway. The main point is that it uses C# for the generation of the random stuff. You can control the characters you want to use. It uses Linq, so it can be made to run much faster, but you should already see a massive performance boost when compared to random generation in PS. It will generate reasonably sized textfiles, if you want GB sized data you'll want to look into other methods.

$code = @"
using System;
using System.Linq;

namespace HelloWorld
{
    public class Program$id
    {
        const string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
        private static Random random = new Random();
        
        public static string RandomString(int length) // infinitely faster than Get-Random in Powershell
        {
            return new string(Enumerable.Repeat(chars, length)
              .Select(s => s[random.Next(s.Length)]).ToArray());
        }

    }
}
"@

Add-Type -TypeDefinition $code -Language CSharp 
$lines = 100000
$lenght = 70
$outfile = "e:\temp\dummy.txt" # tweak as needed
$sb = [System.Text.StringBuilder]::new() # spin up a stringbuilder to hold characters
# create a string buffer with $lines of $length characters
1..$lines | % { 
    [void]$sb.AppendLine((Invoke-Expression "[HelloWorld.Program$id]::RandomString($lenght)"))
}
$sb.ToString() | Out-File $outfile # write out results
cat $outfile -Tail 10 # show last 10 lines in output file

This takes 2-3 seconds on my system (including the cat).

Rno
  • 784
  • 1
  • 6
  • 16
  • Thanks for the suggestion, I think I'll go with the accepted answer as it involves a tiny change to my current method, and speeds up the process significantly. – Laurie Oct 25 '20 at 14:19