36

The Reference Source page for stringbuilder.cs has this comment in the ToString method:

if (chunk.m_ChunkLength > 0)
{
    // Copy these into local variables so that they 
    // are stable even in the presence of ----s (hackers might do this)
    char[] sourceArray = chunk.m_ChunkChars;
    int chunkOffset = chunk.m_ChunkOffset;
    int chunkLength = chunk.m_ChunkLength;

What does this mean? Is ----s something a malicious user might insert into a string to be formatted?

xxbbcc
  • 16,930
  • 5
  • 50
  • 83
Micteu
  • 451
  • 5
  • 17
  • 10
    We need a hacker to anwer this question :) – 54l3d Jun 03 '15 at 22:11
  • 3
    I'm no hacker but I'm guessing that whatever `----s` means some word or phrase that ends with s, and the programmer just didn't want to give it away that easily – Zohar Peled Jun 03 '15 at 22:11
  • 3
    @54l3d, your name looks like L33Tspeak. I bet you'd be a good one to ask. – Micteu Jun 03 '15 at 22:12
  • 2
    I'm guessing the phrase (which @Jerron Vanneval's answer says was "race conditions") was elided when the Reference Source was released -- and that in the original MS source it was left intact. – Ross Presser Jun 03 '15 at 22:12
  • 2
    Duplicate of http://stackoverflow.com/questions/30595203/redacted-comments-in-mss-source-code-for-net (this is more generic and was asked earlier). – leppie Jun 04 '15 at 10:00
  • 1
    As leppie points out, [I asked this question just a few days ago](http://stackoverflow.com/q/30595203/24874). – Drew Noakes Jun 04 '15 at 11:54
  • Microsoft fell victim to one of the clbuttic blunders! The most famous of which is "never get involved in a land war in Asia," but only slightly less well-known is this: "Never run seek and destroy on the source code!" – CodesInChaos Jun 04 '15 at 13:52
  • I bet the answer is 'StackOverflows' ;) – MeanGreen Jun 04 '15 at 14:19
  • @CodesInChaos it took some research to fully figure out that joke. – usr Jun 04 '15 at 21:34

4 Answers4

41

The source code for the published Reference Source is pushed through a filter that removes objectionable content from the source. Verboten words are one, Microsoft programmers use profanity in their comments. So are the names of devs, Microsoft wants to hide their identity. Such a word or name is substituted by dashes.

In this case you can tell what used to be there from the CoreCLR, the open-sourced version of the .NET Framework. It is a verboten word:

// Copy these into local variables so that they are stable even in the presence of race conditions

Which was hand-edited from the original that you looked at before being submitted to Github, Microsoft also doesn't want to accuse their customers of being hackers, it originally said races, thus turning into ----s :)

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 5
    So *race* is a forbidden word? And *Which was hand-edited* means "automatically edited"? Because an human editor considering that *race* to mean *human race* seems to be a little strange – xanatos Jun 04 '15 at 07:42
  • 1
    No, you can tell it is not by looking at the CoreCLR source. The filter was pretty dumb, an early version also had a nasty bug that caused the source to be emitted twice. It just blindly applied a list of words that were considered risky. Sex, religion, race, politics, profanity. Compare to [Wikipedia guidelines](http://en.wikipedia.org/wiki/Wikipedia:Sex,_Religion_and_Politics). – Hans Passant Jun 04 '15 at 10:32
  • @HansPassant: That is not a guideline... [try this one instead](http://en.wikipedia.org/wiki/WP:CENSOR). – Kevin Jun 04 '15 at 14:09
  • Hmm, no, that's a guide to their readers, not their content contributors. OpenCLR has a contributing guide but does not mention profanity. Highly unlikely they'll pay attention to the pull request when you swear a lot, too much work. – Hans Passant Jun 04 '15 at 19:25
23

In the CoreCLR repository you have a fuller quote:

Copy these into local variables so that they are stable even in the presence of race conditions

Github

Basically: it's a threading consideration.

Jeroen Vannevel
  • 43,651
  • 22
  • 107
  • 170
  • 2
    For extra credit, someone should find more occurrences of "race condition" in the CoreCLR, and find corresponding occurrences of "----". – John Saunders Jun 03 '15 at 22:15
  • 3
    It's not exactly a fuller quote, since it is definitely different by more than just the `----`. – hatchet - done with SOverflow Jun 03 '15 at 22:16
  • 1
    Why definitely different? Different number of characters? Why do you assume character count remains the same over the transformation? – John Saunders Jun 03 '15 at 22:16
  • @hatchet: if you assume that each hyphen represents a single character. I am confident enough to say that both comments refer to the same idea behind it. – Jeroen Vannevel Jun 03 '15 at 22:20
  • 3
    I'm thinking it was redacted to avoid giving a hint about what kind of exploit they were trying to avoid. And why on earth would you tell people how many characters that exploit's name is if you're trying to hide it? – Dan Field Jun 03 '15 at 22:21
  • 6
    @DanField It's probably just an overly zealous find-and-replace in an attempt to ensure political correctness (specifically, to remove racism). – user253751 Jun 04 '15 at 00:12
  • @T.J.Crowder And that removes the comment from the source code how exactly? – user253751 Jun 04 '15 at 10:58
  • @immibis: As I said: You fix it in the source. See the bit before the first occurrence of the word "and" in my comment. – T.J. Crowder Jun 04 '15 at 11:31
  • @T.J.Crowder ... well then you appear to be agreeing that Microsoft should have done what they did, while simultaneously complaining about it? – user253751 Jun 04 '15 at 11:39
  • @immibis: You seem intent on misreading the above. 1. **Fix the source** (not redaction, rephrasing). 2. Deal with the person causing/allowing the issue. 3. No need to then have auto-redaction software. – T.J. Crowder Jun 04 '15 at 11:45
  • 6
    I think "race conditions" originally was shortened "races". Even if the filter wouldn't preserve the number of characters, it'd be very weird to gobble up "condition". – CodesInChaos Jun 04 '15 at 13:55
9

In addition to the great answer by @Jeroen, this is more than just a threading consideration. It's to prevent someone from intentionally creating a race condition and causing a buffer overflow in that manner. Later in the code, the length of that local variable is checked. If the code were to check the length of the accessible variable instead, it could have changed on a different thread between the time length was checked and wstrcpy was called:

        // Check that we will not overrun our boundaries. 
        if ((uint)(chunkLength + chunkOffset) <= ret.Length && (uint)chunkLength <= (uint)sourceArray.Length)
        {
            ///
            /// imagine that another thread has changed the chunk.m_ChunkChars array here!
           /// we're now in big trouble, our attempt to prevent a buffer overflow has been thawrted! 
           /// oh wait, we're ok, because we're using a local variable that the other thread can't access anyway.
            fixed (char* sourcePtr = sourceArray)
                string.wstrcpy(destinationPtr + chunkOffset, sourcePtr, chunkLength);
        }
        else
        {
            throw new ArgumentOutOfRangeException("chunkLength", Environment.GetResourceString("ArgumentOutOfRange_Index"));
        }
    }
    chunk = chunk.m_ChunkPrevious;
} while (chunk != null);

Really interesting question though.

Dan Field
  • 20,885
  • 5
  • 55
  • 71
2

Don't think that this is the case - the code in question copies to local variables to prevent bad things happening if the string builder instance is mutated on another thread.

I think the ---- may relate to a four letter swear word...

Rich O'Kelly
  • 41,274
  • 9
  • 83
  • 114
  • 2
    Genuinely curious what four letter swear word you think is something a hacker might "do" to cause instability in a more naïve implementation? – OJFord Jun 04 '15 at 13:17