StringBuilder comments
Eric Lippert's answer here How does StringBuilder Work (asked for C# but applies to PowerShell because it's the same .Net class) says that internally it uses a linked list of parts.
That means it does have to do something like ToString()
in the background to convert that data structure into a usable string for printing.
Here's another StringBuilder discussion - BlueRaja's answer particularly.
Code comments
for (iteration) {
$temp = "test";
$temp = $null # Which is better?
# or
clear-variable temp # Which is better?
$temp = "test2"
[GC]::Collect(); # Is this needed?
}
$temp = $null
does a simple assignment, Clear-Variable temp
has to do cmdlet name resolution, launching a cmdlet, parameter binding, and get through the code where Clear-Variable
has more features even if you are not using them. But doing one assignment then the next $temp = "test"; $temp = "test2"
is probably fine without clearing the value at all.
[GC]::Collect()
forces the garbage collector to run - that makes work happen, and doing work takes longer than doing nothing, so no it's not needed to make anything faster in the small scale. It's never (?) needed in PowerShell because it will always happen automatically at some point - but if you've loaded hundreds of MB or some GB into memory and no longer need it, calling [GC]::Collect()
might help free it up sooner, and might make some bigger data processing script run a bit faster. Maybe.
General comments
Wanting to know what happens in the background is respectable - you should, it's possible to do very slow things without realising it, when there is a faster way.
But what @Tomalak is saying in the comments is sensible - PowerShell is not built to be as fast as possible, it's built to be convenient and readable for administration tasks, it's not a connection-multiplexing webserver, or a bit-shifting 3D game engine, its reason for existing is 'wrapping lots of work in a few commands'.
'Convenient' means you spend computer power to save human power. It does more, so you have to write less. i.e. in a way, it's meant to be slower as a deliberate trade off to get more convenience. All scripting languages are.
Readable means you want to prefer code focusing on the task, to supporting code focusing on the behind-the-scenes mechanisms and triggering the garbage collector or whatever. Again, usability over performance.
And we are a long way above the CPU, just try:
measure-command { $x = 1 }
measure-command { $x = 1 }
and look at TotalMilliseconds = first run, 5 milliseconds, next run 1 millisecond, for me. It got 80% of the runtime knocked off just by doing it again. Next run, 1.3ms - 30% slower for no reason.
.Net JIT compilation, other tasks happening on your system, this is your micro-optimization of variable assignment making no difference to anything, the changes are lost in the noise.
Because of this effect, worrying about micro-performance in PowerShell is a bit of a waste of time.
But I do see that you can't know what's worth worrying about and what isn't, until you've learned that, so Tomalak's dismissal "Stop worrying about the wrong stuff way ahead of time" is a bit of a catch-22 - you don't know what the wrong stuff is! "Write something" is great advice. Working code which solves your problems slowly is much better than spending that time procrastinating worrying that your future code might be slow.
Write! When it's slow and annoying, then investigate to find the slowest parts and rewrite those. When you've done that enough, you'll avoid the slowest things when you write new code because you know they are patterns you often rewrite.
Now try
Measure-Command { $x = @(); foreach ($i in 1..10000) { $x += $i }}
Measure-Command { $x = foreach ($i in 1..10000) { $i }}
3.5 seconds, vs 0.015 seconds.
Wow.
Of course, 1..10000
generates a huge array in memory, surely we can make it better by using a counter instead of generating the array:
Measure-Command { $x = for ($i=1; $i -le 10000; $i++) { $i }}
Wait that's 0.03 seconds. Not only is it uglier and less convenient, it's also slower. Testing and counting at the PowerShell layer is worse than whatever 1..10000
is doing at the lower levels.
Is it worth worrying about? No, because 0.015 or 0.03 seconds is really saying "fast enough, go look at some other part of the code like reading the entire file ten times". Is 3 seconds even worth worrying about?
PowerShell optimization tends to go:
Find the most common traps which are slow ($thing +=
string or array concatenation in a loop, and loading big things entirely into memory before starting to do any processing) and change them. There aren't many, and they become quite easy to spot with practise - jump to where the loops are.
Rethink your algorithm so it takes less work to do, in any language. Much harder, but gets you the bigger wins. As a quick handwave, any big tangle of loops and comparing arrays against each other can often be changed to use Group-Object
or Hashtables, that tends to come up a lot in the kind of sorting and reporting PowerShell is used for.
Push some of it out to C#.
At the point where you're worrying about whether StringBuilder is calling .ToString() or not, and whether clearing a variable one way or anther way is hurting performance, you're either focusing on the wrong code and the slowest bits are somewhere else, or the overall design is problematic, or you're way past the point where you should have been moving to another language because Powershell is too high level to give you millisecond-levels of control.