9

Consider these two definitions for GetWindowText. One uses a string for the buffer, the other uses a StringBuilder instead:

[DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern int GetWindowText(IntPtr hWnd, StringBuilder lpString, int nMaxCount);

[DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern int GetWindowText(IntPtr hWnd, string lpString, int nMaxCount);

Here's how you call them:

var windowTextLength = GetWindowTextLength(hWnd);

// You can use either of these as they both work
var buffer = new string('\0', windowTextLength);
//var buffer = new StringBuilder(windowTextLength);

// Add 1 to windowTextLength for the trailing null character
var readSize = GetWindowText(hWnd, buffer, windowTextLength + 1);

Console.WriteLine($"The title is '{buffer}'");

They both seem to work correctly whether I pass in a string, or a StringBuilder. However, all the examples I've seen use the StringBuilder variant. Even PInvoke.net lists that one.

My guess is the thinking goes 'In C# strings are immutable, therefore use StringBuilder', but since we're poking down to the Win32 API and messing with the memory locations directly, and that memory buffer is for all intents and purposes (pre)allocated (i.e. reserved for, and being currently used by the string) by the nature of it being assigned a value at its definition, that restriction doesn't actually apply, hence string works just fine. But I'm wondering if that assumption is wrong.

I don't think so because if you test this by increasing the buffer by say 10, and change the character you're initializing it with to say 'A', then pass in that larger buffer size to GetWindowText, the string you get back is the actual title, right-padded with the ten extra 'A's that weren't overwritten, showing it did update that memory location of the earlier characters.

So provided you pre-initialize the strings, can't you do this? Could those strings ever 'move out from under you' while using them because the CLR is assuming they're immutable? That's what I'm trying to figure out.

Mark A. Donohoe
  • 28,442
  • 25
  • 137
  • 286
  • 2
    This could be written up as an answer, however it contains all the information you need https://limbioliong.wordpress.com/2011/11/01/using-the-stringbuilder-in-unmanaged-api-calls/ – TheGeneral Aug 28 '18 at 00:57
  • I'm not sure about the comment "[...] the managed string type is not usable. A managed string cannot have its internal buffer pre-allocated and then passed to unmanaged code to be filled." as I can clearly prove that statement false. You can too. Increase the size of the buffer to be 10 greater than needed. Change the char from null ('\0') to something visible, like 'A'. Pass the size of this now-larger buffer into the call to 'GetWindowText'. The string you get back is your string (i.e. with all the 'A's) with the first part overwritten by the Win32 API call, thus proving that false. – Mark A. Donohoe Aug 28 '18 at 01:01
  • Indeed good point – TheGeneral Aug 28 '18 at 01:04
  • Adding what I mean... no it can't *just* be 'pre-allocated' per se, but it can be pre-defined, which is allocation+assignment/writing (as opposed to allocating which is reserving but not yet using). In this case, I fill it with the null character. which is essentially the same thing as allocating the size I need. Then the API just overwrites my data. That's what I think he got wrong. – Mark A. Donohoe Aug 28 '18 at 01:05
  • 1
    Well it **is** pre-allocating it, yeah its a little misleading, which is why we don't learn to code from wordpress pages – TheGeneral Aug 28 '18 at 01:06
  • Well he makes statements but doesn't back up the 'why' which is what I'm actually after. Without him backing that statement up, paired with what appears to be a demonstrable example disproving it, I'm right back to my original question. – Mark A. Donohoe Aug 28 '18 at 01:07
  • You can argue that using StringBuilder is cleaner, because strings are supposed to be immutable which seems to be violated here and thus it’s something of an unexpected behavior. – ckuri Aug 28 '18 at 02:00
  • 1
    Well not quite because strings are supposed to be immutable *within the .NET runtime*. The nature of using P/Invoke calls implicitly means you are peeking into the underpinnings of it and going around those protections. Technically using StringBuilder in the way you have to is doing the same thing. You're not 'building' a string. You're overwriting pre-allocated space which is also a violation of .NET. I do get what you're saying though, but again, I counter, if you're doing P/Invoke, you should know you're not in .NET-land anymore. – Mark A. Donohoe Aug 28 '18 at 02:03
  • Just because your code hasn't failed yet doesn't mean it is correct. I'm not sure what you are trying to achieve here. Relying on implementation details and chance doesn't seem very practical. – David Heffernan Aug 28 '18 at 06:49
  • Well, if you use `string`, its length won't be updated by P/Invoke. You'll end up with a padded string, and that's clearly not convenient at all. – Lucas Trzesniewski Aug 28 '18 at 12:27
  • I just tested this by purposely padding the string itself with an extra 20 characters and changing the pad char to 'X', but leaving the values I passed to `GetWindowText` the same. I incorrectly assumed `GetWindowText` would write in the null-terminating character, which the string would respect, but it doesn't. It displays it as '?' and I see all the extra padded characters. I don't see that with the same test using `StringBuilder`. This is the only proof I've seen why `string` is wrong and `StringBuilder` is correct. Can you please put this in an answer so I can mark it as accepted? – Mark A. Donohoe Aug 28 '18 at 15:59
  • It would be nice if there were a managed debugging assistant that warns about this mistake. But there is just too much code around that gets its wrong and gets away with it anyway. Code that started life in the VB6 days, there is quite a lot of it. At least they did *something* about it, you can no longer replace String.Empty :) – Hans Passant Aug 28 '18 at 16:54
  • 1
    The CLR optimizes on the assumption that strings are immutable. For example, if you mutate a string that is being used as a dictionary key, the dictionary will stop working properly. – Raymond Chen Aug 28 '18 at 21:10

2 Answers2

3

First off, pre-allocated is a misleading word in current context.The string is nothing different than just another .Net immutable string, and is as immutable as Hugh Jackman in real life. I believe OP knows this already.

In fact:

// You can use either of these as they both work
var buffer = new string('\0', windowTextLength);

is exactly same as:

// assuming windowTextLength is 5
var buffer = "\0\0\0\0\0"; 

Why shouldn't we use String/string and instead use StringBuilder for passing callee modifiable arguments to Interop/Unmanaged code? Are there specific scenarios where it will fail?

Honestly, I found this an interesting question and tested a few scenarios, by writing a custom Native DLL that takes a string and StringBuilder, while I force garbage collection, by forcing GC in different thread and so on. My intention was to force-relocate the object while its address was passed to an external library though PInvoke. In all cases, the object's address remained same even when other objects relocated. On research I found this by Jeffrey himself: The Managed Heap and Garbage Collection in the CLR

When you use the CLR’s P/Invoke mechanism to call a method, the CLR pins the arguments for you automatically and unpins them when the native method returns.

So takeaway is we can use it because it seems to work. But should we? I believe No:

  1. Because its clearly mentioned in the docs, Fixed-Length String Buffers. So string works for now, may not work in future releases.
  2. Because StringBuilder is library-provided mutable type, and logically it makes more sense to allow a mutable type to be modified vs an immutable type (string).
  3. There's a subtle advantage. When using StringBuilder, we're pre-allocating capacity, and not string. What this does is, we get rid of additional steps to trim/sanitize the string, and also not bother about terminating null character.
Mark A. Donohoe
  • 28,442
  • 25
  • 137
  • 286
subdeveloper
  • 1,381
  • 8
  • 21
  • 1
    `"\0\0\0\0\0"` is quite different from `new string('\0', 5)`. The former is interned, the latter is not. Modifying an interned string is a recipe for a headache. – Lucas Trzesniewski Aug 28 '18 at 17:02
  • Lucas - I meant from immutability point of view. Though thanks for specifying, this will add value for reader. – subdeveloper Aug 28 '18 at 17:10
  • Yep, it's not pre-allocated, it's full-on allocated and used. But the Win32 APIs don't care if it's used because it's just gonna overwrite it anyway so same difference to Win32. Still, as Lucas pointed out, the real issue is that managed code doesn't use (or even care about) the null terminating character to determine length. It uses a variable which the Win32 APIs *don't* update, so while the underlying memory is ultimately correctly written, unless you defined a string with the exact size you need prior to p/invoking, you end up with the length variable out of sync back on the managed side. – Mark A. Donohoe Aug 29 '18 at 00:42
1

If you pass a string to a function using P/Invoke, the CLR will assume the function will read the string. For efficiency, the string is pinned in memory and a pointer to the first character is passed to the function. No character data needs to be copied this way.

Of course, the function can do whatever it wants to the data in the string, including modifying it.

This means the function will overwrite the first few characters without issue, but buffer.Length will remain unchanged and you'll end up with the existing data at the end of the string still present in the string. .NET strings store their length in a field. They are also null-terminated, but the null terminator is only used as a convenience for interoperability with C code and has no effect in managed code.

Using such a string wouldn't be convenient as unless you pre-defined the string's size to perfectly match where the null-terminated character will ultimately be once written, .NET's length field will be out of sync with the underlying data.

Also, it's better this way, since changing the length of a string would certainly corrupt the CLR heap (the GC wouldn't be able to walk the objects). Strings and arrays are the only two object types that don't have a fixed size.

On the other hand, if you pass a StringBuilder through P/Invoke, you're explicitly telling the marshaler the function is expected to write to the instance, and when you call ToString() on it, it does update the length based on the null-termination character and everything is perfectly in sync.

Better use the right tool for the job. :)

Mark A. Donohoe
  • 28,442
  • 25
  • 137
  • 286
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • 1
    _"you're explicitly telling the marshaler the function is expected to write to the instance"_ - correct but the behavior is same as `string`, in cases when your callee overwrites past the allocated capacity. Tested it! – subdeveloper Aug 28 '18 at 17:29
  • @subdeveloper of course, you still need to respect the contract - which here states that you're supplying a fixed-length buffer to the function. – Lucas Trzesniewski Aug 28 '18 at 18:05
  • Lucas, this is *very* important information (about the null-termination character and the separate length value) that had I known that up front, would have answered the question before I posted it. This is spot-on why StringBuilder is the way to go. However, I see you have a rep of 41k but SubDeveloper only has 220, and I'm impressed that he actually built a native DLL so I kinda want to split the points, but since you can't do that, mind if I give it to him? Trying to do the right thing here, but again, this is the main reason. – Mark A. Donohoe Aug 29 '18 at 00:29
  • Actually, on second thought, his really doesn't have the critical piece about the length variable, so I really have to give this to you. That's the defining reason not to use `string`. Good job, mate! – Mark A. Donohoe Aug 29 '18 at 00:44
  • @MarqueIV It was very thoughtful of you to consider the research and rep, but you did the right thing by marking the correct answer. We're all here to learn and grow, and I indeed learned something new yesterday from your question, and obviously from Lucas' answer. – subdeveloper Aug 29 '18 at 06:09
  • Thanks! I appreciate the acknowledgement for doing that. Honestly, considering how you researched this, I'm surprised your rep is as low as it is though! You're clearly not new at this thing. :) – Mark A. Donohoe Aug 29 '18 at 06:58
  • @MarqueIV you choose which question to mark as accepted - I have enough points anyway. Also, if you want to reward subdeveloper's effort you can award him a bounty :) – Lucas Trzesniewski Aug 29 '18 at 08:15