1

I'd like to user Powershell to create a random text file for use in basic system testing (upload, download, checksum, etc). I've used the following articles and come up with my own code snippet to create a random text file but the performance is terrible.

Here is my code sample that takes approximately 227 seconds to generate a 1MB random text file on a modern Windows 7 Dell laptop. Run time was determined using the Measure-Command cmdlet. I repeated the test several times during different system load with similar long runtime results.

# select characters from 0-9, A-Z, and a-z
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
# write file using 128 byte lines each with 126 random characters
1..(1mb/128) | %{-join (1..126 | %{get-random -InputObject $chars }) } `
  | out-file test.txt -Encoding ASCII

I am looking for answers that discuss why this code has poor performance and suggestions for simple changes I can make to improve the runtime for generating a similar random text file (ASCII text lines of 126 random alphanumeric characters - 128 bytes with "\r\n" EOL, output file an even number of megabytes such as the above 1MB sample). I would like file output to be written in pieces (one or more lines at a time) so that we never need a string the size of the output file stored in memory.

Community
  • 1
  • 1
Mister_Tom
  • 1,500
  • 1
  • 23
  • 36
  • Using technique from @mjolinor we reduced run-time on my system to roughly 30 seconds per MB. To improve on this I'm thinking I might want to use a language other than Powershell - testing some of the other file writing suggestions for same output requirements yielded tiny improvements. – Mister_Tom Jan 28 '15 at 17:08

4 Answers4

4

Agree with @dugas that the bottleneck is calling Get-Random for every character.

You should be able to achieve nearly the same randomness if you increase your character array set, and use the -count property of Get-Random.

If you have V4, the .foreach method is considerably faster than foreach-object.

Also traded Out-File for Add-Content, which should also help.

# select characters from 0-9, A-Z, and a-z
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
$chars = $chars * 126
# write file using 128 byte lines each with 126 random characters
(1..(1mb/128)).foreach({-join (Get-Random $chars -Count 126) | add-content testfile.txt }) 

That finished in about 32 seconds on my system.

Edit: Set-Content vs Out-File, using the generated test file:

$x = Get-Content testfile.txt

(Measure-Command {$x | out-file testfile1.txt}).totalmilliseconds
(Measure-Command {$x | Set-Content testfile1.txt}).totalmilliseconds

504.0069
159.0842
mjolinor
  • 66,130
  • 7
  • 114
  • 135
  • I like the idea of increasing the size of the character selection set by repetition. I agree that the random content generated should be similar. Interesting mention of Powershell v4 and .foreach method - I'll be trying this out too. Do you have a reference for why Add-Content might be faster than Out-File? I'll likely benchmark along with StreamWriter suggestion from StephenP. – Mister_Tom Jan 28 '15 at 02:57
  • 1
    See this: http://stackoverflow.com/questions/10655788/powershell-set-content-and-out-file-what-is-the-difference for discussion of the difference. Set-Content holds a write lock on the file, and avoids the repetitive file opening and closing @StephenP referred to. – mjolinor Jan 28 '15 at 03:18
  • Run time was reduced by 93% just by increasing selection set size and replacing the inner loop with a single call to (get-random $chars -count 126). My tests for Add-Content vs Out-File were less impressive, only time savings of approximately 4% when I replaced Out-File ASCII with Add-Content. I love the single call to get-random with increased selection set size - makes the code shorter and clearer while creating a random file much more quickly. – Mister_Tom Jan 28 '15 at 05:04
  • This solution increases the size of `$chars` from 62 to 7812. As `Get-Random` gets you random elements from the input object in a non sequential order would it be a bad idea to just use `$chars.count` for the count/divisor? That way you would get the maximum number of elements each time in random order. – Seth Dec 23 '16 at 08:10
4

If you are ok with punctuation you can use this:

Add-Type -AssemblyName System.Web
#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#set number of iterations
$count = 1mb/128
do{
  #Write the 1267 chars plus eol
  [System.Web.Security.Membership]::GeneratePassword(126,0) | Out-File $fn -Append ascii
  #decrement the counter
  $count--
}while($count -gt 0)

Which gets you to around 7 seconds. Sample Output:

0b5rc@EXV|e{kftc+1+Xn$-c%-*9q_9L}p=I=k@zrDg@HaJDcl}B(38i&m{lV@vlq%5h/a?m2X!yo]qs0=pEw:Tn4wb5F$k$O85$8F.QLvUzA{@X2-w%5(3k;BE2Qi

Using a stream writer instead of Out-File -Append avoids the open/close cycles and drops the same to 62 milliseconds.

Add-Type -AssemblyName System.Web
#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#set number of iterations
$count = 1mb/128
#create a filestream
$fs = New-Object System.IO.FileStream($fn,[System.IO.FileMode]::CreateNew)
#create a streamwriter
$sw = New-Object System.IO.StreamWriter($fs,[System.Text.Encoding]::ASCII,128)
do{
     #Write the 1267 chars plus eol
     $sw.WriteLine([System.Web.Security.Membership]::GeneratePassword(126,0))
     #decrement the counter
     $count--
}while($count -gt 0)
#close the streamwriter
$sw.Close()
#close the filestream
$fs.Close()

You could also use a stringbuilder, and GUIDs to generate pseudorandom numbers and lowercase.

#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#set number of iterations
$count = 1mb/128
#create a filestream
$fs = New-Object System.IO.FileStream($fn,[System.IO.FileMode]::CreateNew)
#create a streamwriter
$sw = New-Object System.IO.StreamWriter($fs,[System.Text.Encoding]::ASCII,128)
do{
    $sb = New-Object System.Text.StringBuilder 126,126
    0..3 | %{$sb.Append([GUID]::NewGuid().ToString("N"))} 2> $null
    $sw.WriteLine($sb.ToString())
    #decrement the counter
    $count--
}while($count -gt 0)
#close the streamwriter
$sw.Close()
#close the filestream
$fs.Close()

This takes about 4 seconds and generates the following sample:

1fef6ccabc624e4dbe13a0415764fd2c58aa873377c7465eaecabdf6ba6fdf71c55496600a374c4c8cff75be46b1fe474230231ffccc4e3aa2753391afb32c

If you are hell bent to use the same chars as in your sample you can do so with the following:

#get a random filename in the present working directory
$fn = [System.IO.Path]::Combine($pwd, [GUID]::NewGuid().ToString("N") + '.txt')
#array of valid chars
$chars = [char[]] ([char]'0'..[char]'9' + [char]'A'..[char]'Z' + [char]'a'..[char]'z')
#create a random object
$rand = New-Object System.Random
#set number of iterations
$count = 1mb/128
#get length of valid character array
$charslength = $chars.length
#create a filestream
$fs = New-Object System.IO.FileStream($fn,[System.IO.FileMode]::CreateNew)
#create a streamwriter
$sw = New-Object System.IO.StreamWriter($fs,[System.Text.Encoding]::ASCII,128)
do{
    #get 126 random chars This is the major slowdown
    $randchars = 1..126 | %{$chars[$rand.Next(0,$charslength)]}
    #Write the 1267 chars plus eol
    $sw.WriteLine([System.Text.Encoding]::ASCII.GetString($randchars))
    #decrement the counter
    $count--
}while($count -gt 0)
#close the streamwriter
$sw.Close()
#close the filestream
$fs.Close()

This takes ~27 seconds and generates the following sample:

Fev31lweOXaYKELzWOo1YJn8LpZoxonWjxQYhgZbR62EmgjHit5J1LrvqniBB7hZj4pNonIpoCZSHYLf5H63iUUN6UhtyOQKPSViqMTvbGUomPeIR36t1drEZSHJ6O

Indexing the char array and the out-file -Append opening and closing the file each time is a major slowdown.

StephenP
  • 3,895
  • 18
  • 18
  • Interesting use of password and GUID generators for pseudo or semi random sequences. I also appreciate the tip on using StreamWriter rather than the Powershell file output cmdlets - I'll definitely be trying this type of output for some performance gain. I will also look into reducing the number of independent calls to random number generator as suggested by dugas and mjolinor. – Mister_Tom Jan 28 '15 at 02:37
2

One of the bottlenecks is calling the get-random cmdlet in the loop. On my machine that join takes ~40ms. If you change to something like:

%{ -join ((get-random -InputObject $chars -Count 62) + (get-random -InputObject $chars -Count 62) + (get-random -InputObject $chars -Count 2)) }

it is reduced to ~1ms.

dugas
  • 12,025
  • 3
  • 45
  • 51
  • Good point, I'll try to reduce the number of calls to random number generator. I appreciate your creative way to get more than $chars.count for the line by concatenating the output of three calls. Thanks for first post on my question :-) – Mister_Tom Jan 28 '15 at 03:22
0

Instead of using Get-Random to generate the text as per mjolinor suggestions, I improved the speed by using GUIDs.

Function New-RandomFile {
    Param(
        $Path = '.', 
        $FileSize = 1kb, 
        $FileName = [guid]::NewGuid().Guid + '.txt'
        ) 
    (1..($FileSize/128)).foreach({-join ([guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid -Replace "-").SubString(1, 126) }) | set-content "$Path\$FileName"
}

I've ran both versions with Measure-Command. The original code took 1.36 seconds.

This one took 491 milliseconds. Running:

New-RandomFile -FileSize 1mb

UPDATE:

I've updated my function to use a ScriptBlock, so you can replace the 'NewGuid()' method with anything you want.

In this scenario, I make 1kb chunks, since I know I'm never creating smaller files. This improved the speed of my function drastically!

Set-Content forces a NewLine at the end, which is why you need to remove 2 Characters each time you write to file. I've replaced it with [io.file]::WriteAllText() instead.

Function New-RandomFile_1kChunks {
    Param(
        $Path = (Resolve-Path '.').Path, 
        $FileSize = 1kb, 
        $FileName = [guid]::NewGuid().Guid + '.txt'
        ) 

    $Chunk = { [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid -Replace "-" }

    $Chunks = [math]::Ceiling($FileSize/1kb)

    [io.file]::WriteAllText("$Path\$FileName","$(-Join (1..($Chunks)).foreach({ $Chunk.Invoke() }))")

    Write-Warning "New-RandomFile: $Path\$FileName"

}

If you dont care that all chunks are random, you can simply Invoke() the generation of the 1kb chunk once.. this improves the speed drastically, but won't make the entire file random.

Function New-RandomFile_Fast {
    Param(
        $Path = (Resolve-Path '.').Path, 
        $FileSize = 1kb, 
        $FileName = [guid]::NewGuid().Guid + '.txt'
        ) 

    $Chunk = { [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid +
               [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid + [guid]::NewGuid().Guid -Replace "-" }
    $Chunks = [math]::Ceiling($FileSize/1kb)
    $ChunkString = $Chunk.Invoke()

    [io.file]::WriteAllText("$Path\$FileName","$(-Join (1..($Chunks)).foreach({ $ChunkString }))")

    Write-Warning "New-RandomFile: $Path\$FileName"

}

Measure-Command all these changes to generate a 10mb file:

Executing New-RandomFile: 35.7688241 seconds.

Executing New-RandomFile_1kChunks: 25.1463777 seconds.

Executing New-RandomFile_Fast: 1.1626236 seconds.

Marc Kellerman
  • 466
  • 3
  • 10