12

I was browsing the source of the PluralizationService when I noticed something odd. In the class there are a couple of private dictionaries reflecting different pluralisation rules. For example:

    private string[] _uninflectiveWordList =
        new string[] { 
            "bison", "flounder", "pliers", "bream", "gallows", "proceedings", 
            "breeches", "graffiti", "rabies", "britches", "headquarters", "salmon", 
            "carp", "----", "scissors", "ch----is", "high-jinks", "sea-bass", 
            "clippers", "homework", "series", "cod", "innings", "shears", "contretemps", 
            "jackanapes", "species", "corps", "mackerel", "swine", "debris", "measles", 
            "trout", "diabetes", "mews", "tuna", "djinn", "mumps", "whiting", "eland", 
            "news", "wildebeest", "elk", "pincers", "police", "hair", "ice", "chaos",
            "milk", "cotton", "pneumonoultramicroscopicsilicovolcanoconiosis",
            "information", "aircraft", "scabies", "traffic", "corn", "millet", "rice", 
            "hay", "----", "tobacco", "cabbage", "okra", "broccoli", "asparagus", 
            "lettuce", "beef", "pork", "venison", "mutton",  "cattle", "offspring", 
            "molasses", "shambles", "shingles"};

What are the groups of four dashes in the strings? I did not them see handled in the code, so they're not some kind of a template. The only thing I can think of is that those are censored expletives ('ch----is' would be 'chassis'), which in this case is actually hurting the readability. Did anyone else come across this? If I were to be interested in the actual full list, how would I view it?

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
Emmit
  • 411
  • 3
  • 8
  • Don't know for certain, but my guess would be that it's some kind of placeholder as a wildcard (e.g. matching pattern that consists of ch, then 4 characters, then is would match). – Chris Disley Nov 23 '15 at 14:13
  • 4
    *"pneumonoultramicroscopicsilicovolcanoconiosis"* I'm guessing the tester who found that one got a good laugh out of the bug report, and the developer who fixed it laughed back... (its the longest word in the english language according to Wikipedia) – Ron Beyer Nov 23 '15 at 14:18
  • My best guess would be a pattern match where the letters themselves didn't matter but the length did, for example: cat, hat, bat if it didn't match the other cases could be lumped together in the dash pattern and pluralized the same. Just a guess though. – Stephen Brickner Nov 23 '15 at 14:28
  • That's WAY more grammar related code than I ever expected to find in a ORM library. – Bradley Uffner Nov 23 '15 at 14:31
  • 2
    I can only think of one word (Trapezium) that matches t----zium (which is another word from the same file, so it does look like it is censoring certain words. – sgmoore Nov 23 '15 at 14:31
  • 2
    http://stackoverflow.com/questions/30631626/what-does-s-mean-in-the-context-of-stringbuilder-tostring/30631947#30631947 – Hans Passant Nov 23 '15 at 14:41
  • "Two cabbages" is really found to be less likely to be correct than "two cabbage"? – Jon Hanna Nov 23 '15 at 14:58

1 Answers1

6

From using Reflector to look at the decompiled code I can verify that the compiled version doesn't have "----" in there and it does indeed seem to be some kind of censorship somewhere along the way. The decompiled code has this in the constructor:

this._uninflectiveWordList = new string[] { 
    "bison", "flounder", "pliers", "bream", "gallows", "proceedings", "breeches", "graffiti", "rabies", "britches", "headquarters", "salmon", "carp", "herpes", "scissors", "chassis", 
    "high-jinks", "sea-bass", "clippers", "homework", "series", "cod", "innings", "shears", "contretemps", "jackanapes", "species", "corps", "mackerel", "swine", "debris", "measles", 
    "trout", "diabetes", "mews", "tuna", "djinn", "mumps", "whiting", "eland", "news", "wildebeest", "elk", "pincers", "police", "hair", "ice", "chaos", 
    "milk", "cotton", "pneumonoultramicroscopicsilicovolcanoconiosis", "information", "aircraft", "scabies", "traffic", "corn", "millet", "rice", "hay", "hemp", "tobacco", "cabbage", "okra", "broccoli", 
    "asparagus", "lettuce", "beef", "pork", "venison", "mutton", "cattle", "offspring", "molasses", "shambles", "shingles"
 };

As you can see the censored words are "herpes", "chassis" and "hemp" (if I've followed along correctly). None of which I personally think need censoring which suggests it is some kind of automated system doing it. I would assume that the original source has them in rather than them being added in some kind of precompile merge (if nothing else because "----" really isn't enough for anything to say what it should be replaced with). I'd imagine for some reason the reference website gets them censored.

Hans Passant also in comments linked to an answer to a very similar question: What does ----s mean in the context of StringBuilder.ToString()? . This explains that "The source code for the published Reference Source is pushed through a filter that removes objectionable content from the source".

Community
  • 1
  • 1
Chris
  • 27,210
  • 6
  • 71
  • 92
  • ass, not chassis. It will probably make somebody blush. – Hans Passant Nov 23 '15 at 14:52
  • 3
    You are right that "ass" is what was removed. I was referring to what the full words were. – Chris Nov 23 '15 at 14:54
  • @PeterM: Seems that way but I'm not really in a position to say for sure what they did and why it seemed to make such a mess of it without insider knowledge of the process they use. It seems likely though that it is an automated process being rubbish. At least it reassures me that Skynet isn't likely to be a problem for a while. ;-) – Chris Nov 23 '15 at 14:57
  • @PeterM do you mean "cl----ic" filtering? – Dr Rob Lang Nov 23 '15 at 14:58
  • 1
    @RobLang Ironically (and by that I mean clbuttic being mentioned on SO) http://blog.codinghorror.com/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea/ – Peter M Nov 23 '15 at 14:59
  • 2
    I remember the chat for an online browser game running a filter that just stripped obsenity. Refering to an assassin (or "an in") became rather hard which was troublesome since a unit in the game was called an assassin... – Chris Nov 23 '15 at 15:03