Power Shell : What is the most efficient way to use variable / string builder with memory?

Question

I'm trying to learn efficiency in programming/scripting and I know stuff happens in the background that I may not be aware of.

$sb = new-object System.text.stringbuilder;
$sb.append("Hello World");
Write-Host $sb.clear().append("Hello World 2");

Is this efficient or does it do a .toString() in the background?

and

for (iteration) {

  $temp = "test";
  $temp = $null # Which is better?
  # or 
  clear-variable temp # Which is better?
  $temp = "test2"
  [GC]::Collect(); # Is this needed?
}

Do you have a performance problem? Have you measured? How large is your input? What do you do to it? Have you identified string concatenation as the slow part? Because in any other case this question is a waste of time, harsh as it sounds. Make it work. Then make it fast. Unless it is already fast. — Tomalak, Sep 30 '17 at 22:17
Here is a tip to help you identify whether things make a difference or not - use `Measure-Command` and see if your changes make any difference (spoiler - they won't). Keep in mind that Powershell is a scripting language. You are not optimizing inner loops in assembly here. You are several layers of abstraction away from the processor. And still none of this will take longer than a few microseconds. Stop worrying about the wrong stuff way ahead of time. Write something. — Tomalak, Sep 30 '17 at 22:30
The posted StringBuilder code snippet doesn't really make any sense because StringBuilder is needed only when you concatenate thousands or hundreds of thousands long strings, and in the end you call ToString to produce a combined string. — wOxxOm, Sep 30 '17 at 23:55

score 5 · Answer 1 · edited Jun 20 '20 at 09:12

StringBuilder comments

Eric Lippert's answer here How does StringBuilder Work (asked for C# but applies to PowerShell because it's the same .Net class) says that internally it uses a linked list of parts.

That means it does have to do something like ToString() in the background to convert that data structure into a usable string for printing.

Here's another StringBuilder discussion - BlueRaja's answer particularly.

Code comments

for (iteration) {

  $temp = "test";
  $temp = $null # Which is better?
  # or 
  clear-variable temp # Which is better?
  $temp = "test2"
  [GC]::Collect(); # Is this needed?
}

$temp = $null does a simple assignment, Clear-Variable temp has to do cmdlet name resolution, launching a cmdlet, parameter binding, and get through the code where Clear-Variable has more features even if you are not using them. But doing one assignment then the next $temp = "test"; $temp = "test2" is probably fine without clearing the value at all.
[GC]::Collect() forces the garbage collector to run - that makes work happen, and doing work takes longer than doing nothing, so no it's not needed to make anything faster in the small scale. It's never (?) needed in PowerShell because it will always happen automatically at some point - but if you've loaded hundreds of MB or some GB into memory and no longer need it, calling [GC]::Collect() might help free it up sooner, and might make some bigger data processing script run a bit faster. Maybe.

General comments

Wanting to know what happens in the background is respectable - you should, it's possible to do very slow things without realising it, when there is a faster way.

But what @Tomalak is saying in the comments is sensible - PowerShell is not built to be as fast as possible, it's built to be convenient and readable for administration tasks, it's not a connection-multiplexing webserver, or a bit-shifting 3D game engine, its reason for existing is 'wrapping lots of work in a few commands'.

'Convenient' means you spend computer power to save human power. It does more, so you have to write less. i.e. in a way, it's meant to be slower as a deliberate trade off to get more convenience. All scripting languages are.

Readable means you want to prefer code focusing on the task, to supporting code focusing on the behind-the-scenes mechanisms and triggering the garbage collector or whatever. Again, usability over performance.

And we are a long way above the CPU, just try:

measure-command { $x = 1 }
measure-command { $x = 1 }

and look at TotalMilliseconds = first run, 5 milliseconds, next run 1 millisecond, for me. It got 80% of the runtime knocked off just by doing it again. Next run, 1.3ms - 30% slower for no reason.

.Net JIT compilation, other tasks happening on your system, this is your micro-optimization of variable assignment making no difference to anything, the changes are lost in the noise.

Because of this effect, worrying about micro-performance in PowerShell is a bit of a waste of time.

But I do see that you can't know what's worth worrying about and what isn't, until you've learned that, so Tomalak's dismissal "Stop worrying about the wrong stuff way ahead of time" is a bit of a catch-22 - you don't know what the wrong stuff is! "Write something" is great advice. Working code which solves your problems slowly is much better than spending that time procrastinating worrying that your future code might be slow.

Write! When it's slow and annoying, then investigate to find the slowest parts and rewrite those. When you've done that enough, you'll avoid the slowest things when you write new code because you know they are patterns you often rewrite.

Now try

Measure-Command { $x = @();  foreach ($i in 1..10000) { $x += $i }}

Measure-Command { $x = foreach ($i in 1..10000) { $i }}

3.5 seconds, vs 0.015 seconds.

Wow.

Of course, 1..10000 generates a huge array in memory, surely we can make it better by using a counter instead of generating the array:

Measure-Command { $x = for ($i=1; $i -le 10000; $i++) { $i }}

Wait that's 0.03 seconds. Not only is it uglier and less convenient, it's also slower. Testing and counting at the PowerShell layer is worse than whatever 1..10000 is doing at the lower levels.

Is it worth worrying about? No, because 0.015 or 0.03 seconds is really saying "fast enough, go look at some other part of the code like reading the entire file ten times". Is 3 seconds even worth worrying about?

PowerShell optimization tends to go:

Find the most common traps which are slow ($thing += string or array concatenation in a loop, and loading big things entirely into memory before starting to do any processing) and change them. There aren't many, and they become quite easy to spot with practise - jump to where the loops are.
Rethink your algorithm so it takes less work to do, in any language. Much harder, but gets you the bigger wins. As a quick handwave, any big tangle of loops and comparing arrays against each other can often be changed to use Group-Object or Hashtables, that tends to come up a lot in the kind of sorting and reporting PowerShell is used for.
Push some of it out to C#.

At the point where you're worrying about whether StringBuilder is calling .ToString() or not, and whether clearing a variable one way or anther way is hurting performance, you're either focusing on the wrong code and the slowest bits are somewhere else, or the overall design is problematic, or you're way past the point where you should have been moving to another language because Powershell is too high level to give you millisecond-levels of control.

Power Shell : What is the most efficient way to use variable / string builder with memory?

1 Answers1

StringBuilder comments

Code comments

General comments