-1

Edit Thank you to multiple people's answers which incrementally lead to the solution. Disabling my anti-virus allowed any of the previously suspicious delimiters to now finish instantly. I noticed that my BitDefender was off by one minor version to the version on one of my other properly-functioning computers. I've reinstalled it, verified the versions match on my other computer, and everything is working normally.

Original

Just today, I have found that it is taking a really long time to write trivial files that have a handful of commas compared to others. I've only tested out a few but have found that comma, period, backslash take a long time while tabs, $, or @ do not.

        List<string> lines = new List<string>();
        string delimiter = ",";
        string example = string.Join(delimiter, new[] { "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde" });
        for (int i = 0; i < 30; i++)
        {
            lines.Add(example);
        }

        Console.WriteLine($"Delimiter: '{delimiter}'");
        for (int i = 1; i <= 10; i++)
        {
            Stopwatch stopwatch = Stopwatch.StartNew();
            File.WriteAllLines("test.txt", lines);
            stopwatch.Stop();
            Console.WriteLine($"Trial {i}: {stopwatch.Elapsed}");
        }

Here's the output of running with a comma delimiter which outrageously does not finish instantly.

    Delimiter: ','
    Trial 1: 00:00:00.6407826
    Trial 2: 00:00:00.6414110
    Trial 3: 00:00:00.6520452
    Trial 4: 00:00:00.6511121
    Trial 5: 00:00:00.6879321
    Trial 6: 00:00:00.6000476
    Trial 7: 00:00:00.6409915
    Trial 8: 00:00:00.6424960
    Trial 9: 00:00:00.6544160
    Trial 10: 00:00:00.6418019

Here's the output running with "@", which finishes so quickly that I couldn't kill the process even if I tried.

    Delimiter: '@'
    Trial 1: 00:00:00.0033232
    Trial 2: 00:00:00.0020682
    Trial 3: 00:00:00.0007879
    Trial 4: 00:00:00.0020546
    Trial 5: 00:00:00.0009417
    Trial 6: 00:00:00.0006485
    Trial 7: 00:00:00.0016277
    Trial 8: 00:00:00.0006289
    Trial 9: 00:00:00.0006502
    Trial 10: 00:00:00.0005868

I tried this on other computers and they all finished instantaneously regardless of delimiter. What in the world could be happening?

Edit: Results from Enigmativity's code below I ran this for 10 iterations because it was taking too long given the very weird issue I'm having. As you can see, most of them are pretty instant but oddly not for comma, period and '<'

    '@'       00:00:00.0251197
    '#'       00:00:00.0207233
    '$'       00:00:00.0373336
    '<'       00:00:06.6117783
    ','       00:00:06.6047638
    '\'       00:00:00.0335935
    '.'       00:00:08.8566411
    '\t'    00:00:00.0388599
    ' '     00:00:00.0295777
Eddie
  • 163
  • 1
  • 3
  • 11
  • 5
    I am intrigued to find out what kind of answers you get to this question. Fave'd! – Brien Foss Mar 24 '18 at 04:49
  • 1
    Don't use `DateTime` to do performance testing as it is not terribly accurate. Always use `Stopwatch`. – Enigmativity Mar 24 '18 at 05:08
  • I don't not get the same results as you do when I run your code with `','` or `'@'`. Everything runs about the same. Can you please verify your results? – Enigmativity Mar 24 '18 at 05:11
  • How do you have your configuration project (release or debug)? I execute your code in release mode and the elapsed time is aprox. the same. Could you attempt run your executable in another computer? – Moises Quintero Orea Mar 24 '18 at 05:20
  • It is running in debug mode, but the point here is that running this with either delimiter should take the same amount of time, yet it isn't. I tried this on another computer and both delimiters finish in the same instantaneous fashion. My question is why using a comma takes so long just on my computer. – Eddie Mar 24 '18 at 05:31
  • @Eddie - Try running my code that I just posted. See what results you get then. – Enigmativity Mar 24 '18 at 05:35
  • @Enigmativity Thanks for pointing that out, but I don't think the timing method matters here because we're talking about 1 second vs a split second. The expected result is exactly what you've experienced: that they both finish in the same amount of time, which happens on my other computers. My question is just why my current computer is experiencing this discrepency. – Eddie Mar 24 '18 at 05:36
  • @Enigmativity I've updated my question to show the code and results using Stopwatch. My question still remains why the "@" finishes instantly while the comma takes doesn't. – Eddie Mar 24 '18 at 05:41
  • 3
    If you have antivirus, it may be hooked into your file I/O. Maybe turn it off and run the test again. It could be scanning with regex or other string matching algorithms that perform differently for commas, I suppose. (wild guess) – John Wu Mar 24 '18 at 06:35
  • I am unable to repro what you are seeing. What version of the .NET Framework are you running? What version of Visual Studio? Debug or Release build? Are you running on SSD or spinning rust? – mjwills Mar 24 '18 at 06:35
  • I am unable to repro this on other computers I have as well, so my question is why my current computer is doing this. .NET version 4.6.1. Running with Release build. Using an SSD although it seems like spinning rust for the commas. – Eddie Mar 24 '18 at 06:42
  • 3
    OH MY GOD. @JohnWu, that was it!! I just disabled my anti virus and the comma finishes instantly. I must have a different version of it (BitDefender) on my other PCs. – Eddie Mar 24 '18 at 06:43
  • @mjwills Well, I just found the solution from an answer here, so... very meaningful advice. – Eddie Mar 24 '18 at 06:44

3 Answers3

4

Edit

As I said, I think you are observing side effects of other things. Your system IO can be affected by so many things. You need more than 10 run to start making assumptions.

In addition you have JIT and other things at play. I couldn't insist more on the importance of putting more efforts than you did when trying to assess any performance issue.

A quick rewrite of your test with Benchmark dotnet gives very different results:

public class JoinAndWrite
{
    public JoinAndWrite()
    {
    }

    [Benchmark]
    public void Comma() => SharedAlgorithm(",");

    [Benchmark]
    public void At() => SharedAlgorithm("@");

    [Benchmark]
    public void Dollar() => SharedAlgorithm("$");


    private void SharedAlgorithm(string delimiter)
    {
        var filenameBase = ((int)delimiter[0]).ToString();

        List<string> lines = new List<string>();
        string example = string.Join(delimiter, new[] { "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde", "qwasdasdasde" });
        for (int i = 0; i < 30; i++)
        {
            lines.Add(example);
        }

        for (int i = 1; i <= 10; i++)
        {
            File.WriteAllLines($"{filenameBase}-_attempt-{i}.txt", lines);
        }
    }
}

public class Program
{
    public static void Main(string[] args)
    {
        var summary = BenchmarkRunner.Run<JoinAndWrite>();
    }
}

To run it:

dotnet run -c Release

And the results

 Method |      Mean |     Error |    StdDev |
------- |----------:|----------:|----------:|
  Comma |  8.073 ms | 0.4270 ms | 1.2591 ms |
     At |  9.953 ms | 0.2227 ms | 0.6389 ms |
 Dollar | 10.481 ms | 0.2564 ms | 0.7438 ms |

As you can see there's no significant differences on the mean especially considering the standard deviation. Add the Error margin and you can pretty much assume they are all in the same range the variability is due to external factors. (Other processes consuming processing power and IO for instance).

Original Answer

I think you are not measuring what you think you are measuring and you are doing the measurement part wrong.

First you are using DateTime.Now which has a resolution of about 16 milliseconds. You should be using Stopwatch which can leverage high resolution apis.

Then, you are measuring one cycle of write only with that resolution which guaranties that you'll get it wrong.

I addition, make sure you are using release builds without any debugger or development tooling attached.

Getting performance tests right is hard. I strongly recommand you search and read about it. You can start by reading about the amazing Benchmark dotnet library. It should put you on the right direction.

Cyprien Autexier
  • 1,948
  • 16
  • 20
  • I've updated the code and results with using Stopwatch. My question still remains as to why the "@" delimiter finishes instantly while the comma doesn't. – Eddie Mar 24 '18 at 05:43
  • @Eddie please, consider my update. – Cyprien Autexier Mar 24 '18 at 06:07
  • Thank you for taking the time to do that. I'm having issues installing BenchmarkDotNet. Install-Package : Could not install package 'BenchmarkDotNet 0.10.13'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.5.2', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author. At line:1 char:1 + Install-Package BenchmarkDotNet – Eddie Mar 24 '18 at 06:22
  • It's dotnet 4.6+ – Cyprien Autexier Mar 24 '18 at 06:23
  • And don't forget to read the documentation especially : http://benchmarkdotnet.org/Guides/GoodPractices.htm – Cyprien Autexier Mar 24 '18 at 06:24
  • Oh... *face palm*. Sorry, running it now. – Eddie Mar 24 '18 at 06:28
  • Thank you again for coming up with that but someone found the solution (it was my anti virus). – Eddie Mar 24 '18 at 06:51
  • Yes it is why I said you were probably not measuring what you thought you were ;) especially with IO there can be a lot of causes. – Cyprien Autexier Mar 24 '18 at 07:43
  • This may be true but I'm not sure getting an exact millisecond count of each delimiter would have lead to the solution. The main clue was that certain delimiters were taking exponentially longer than others, and the culprits were common delimiters which lead another user to suspect anti-virus. Your benchmark code would have shown this same exponential discrepancy, just more accurately. My inaccurate code may have showed 1 sec for comma while yours shows 600 ms but they both illustrated the same thing: comma was taking exponentially longer than others. I still appreciate what you wrote up. – Eddie Mar 24 '18 at 16:14
2

Here is a better test:

var delimiters = "@#$<,\\.\t ";

var lines =
    delimiters
        .Select(d => new
        {
            delimiter = d,
            lines =
                Enumerable
                    .Range(0, 1000)
                    .Select(n => String.Join(d.ToString(), Enumerable.Repeat("qwasdasdasde", 10)))
                    .ToArray() 
        })
        .ToDictionary(x => x.delimiter, x => x.lines);

var trials = 
    delimiters
        .ToDictionary(x => x, x => TimeSpan.Zero);

foreach (var i in Enumerable.Range(0, 1000))
{
    foreach (var delimiter in delimiters)
    {
        var sw = Stopwatch.StartNew();
        File.WriteAllLines("test.txt", lines[delimiter]);
        trials[delimiter] = trials[delimiter].Add(sw.Elapsed);
    }
}

This is producing trials for 9 different delimiters, each writing a file of 1,000 lines.

This produces the following result:

del  TimeSpan
'@'  00:00:04.8786198 
'#'  00:00:04.8475040 
'$'  00:00:04.8517601 
'<'  00:00:04.8679027 
','  00:00:04.8948792 
'\'  00:00:04.9045111 
'.'  00:00:04.8932993 
'\t' 00:00:07.7385623 
' '  00:00:07.7918592

Now, this doesn't produce an "outrageous" difference in performance, but space and tab do have a performance hit of about 1.6x that of the other delimiters. Your tests appeared to have a difference of about 1,100.0x.

Hopefully now someone can use these results to explain why there is a slow down on some delimiters.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
2

If you have antivirus, it may be hooked into your file I/O. Maybe turn it off and run the test again. It could be scanning with regex or other string matching algorithms that perform differently for commas, I suppose. (wild guess)

John Wu
  • 50,556
  • 8
  • 44
  • 80
  • I think this was a bug in BitDefender because my computer experiencing these symptons was running "Malware Engine Properties" version 7.75427 while another symptom-free computer ran 7.75428. I reinstalled it on the affected computer to match the symptom-free computer's version and everything is fine. My computer is also more responsive since this change so I'm just guessing the extra scanning that caused the comma to be slow was an error on their part. – Eddie Mar 24 '18 at 16:21