0

I'm currently trying to understand a problem I'm facing. I have the following code:

using (FileStream sw = File.Create(mypath))
{
    sw.Write(source, 0, bytesRead);
    sw.Flush();
}

This is used in a webservice that can be run multiple time thus (even at the same time).

The problem I'm facing is that I suddenly had one file that had a duplicated content (thus the original content was 2x inside the file).

As I can't reproduce the problem I'm wondering if there is a possible constellation (in multithreading) where despite using File.Create to create the stream that the content from another call to the underlying method could result in an appended (instead of an overwritten) content?

Edit: As it was asked I'm trying to explain a bit more about how the multiple calls could be possible.

A third party tool creates important files (.xml) and calls my webservice to transfer them onto a server. If that transfer fails for any reason, the third party tool tries again to transfer them. As I'm seeing multiple transfer attempts in the logs within minutes of each other one fear (which I can't proof if its wrong or true despite max. logging) I have is that the first call takes too long and the second call comes when the first one is still going on. Thus they overlap each other (sadly I can't find any proof for or vs. this with the logs I have available thus I'm going with the worst case scenario, that they DO overlap and thus possible racing conditions occur which lead to this question).

Thomas
  • 2,886
  • 3
  • 34
  • 78
  • 2
    _"can be run multiple time"_ - even with the same `mypath` ? That would simply be a bug. – H H Aug 10 '16 at 10:17
  • yepp same mypath. I'm trying to fixate if its something unexpected from c#/asp.net I'm facing or if the creating program already made the error (thus the program that called the webservice and transmitted the source). From what I got from the logs I can't tell really if the stream was already closed when the second/third one was opened or not. – Thomas Aug 10 '16 at 10:19
  • Either way, you've got to prevent this. Not fix it. – H H Aug 10 '16 at 10:22
  • @HenkHolterman As it is a third party software that does the calls to the web services my chances of completely preventing it are almost 0. Thus I need first to understand what happens, make a fix to make sure I have at least something that prevents the problem and then talk with the third party guys when I'm sure where teh problem is originating from. – Thomas Aug 10 '16 at 10:25
  • Also I always assumed that .Create with no parameter is thread save and either creates a new file, overwrites (not appends) an existing one AND throws an error if a second thread tries to access it. or am I wrong here? – Thomas Aug 10 '16 at 10:26
  • @HenkHolterman I'm getting files from multiple devices (xml files with critical data), that I need to transfer from the devices onto a server for further processing. IF the file was only partially transmitted or the transmit did not work it will be sent again (the multiple times is what I fear "just" that it takes to long once and is tried again because of that, but the logs even at max. are not conclusive enough for me to see if that theory is correct or not. And I checked everything even disk latencies on the devices and the server to no avail so far). – Thomas Aug 10 '16 at 10:46
  • You can try to simulate: create 2 applications, one to write into same file with random intervals as you do, other to open file and check its length. I just did and test with single process (3 threads), there will be plenty of `IOException`s, but file will never get bigger or smaller than the *last* write operation would do. – Sinatr Aug 10 '16 at 12:08
  • @HenkHolterman, good catch (but I'd post it anyway), after waiting long enough I got several times what file size is different. So yes, not synchronized `File.Create` (or is it `Write`? or `Flush`?) doesn't ensure content will be from the last writing. Talking about multiple threads. – Sinatr Aug 10 '16 at 12:52

1 Answers1

0

Place sw.Seek(0, SeekOrigin.End); before your call to the Write-method. This will move the cursor to the end of the stream.

But your problem is not because you're "truncating" where you should "append". You need to design the above code thread-safe as a whole. The easiest way to achieve this is to introduce a lock handle:

private readonly object _lock = new object()

public void SaveToFile(string fileName, byte[] data)
{
    lock(_lock)
    {
        using(FileStream sw = FileStream.Create(fileName))
        {
            sw.Write(data, 0, data.Length);
        }
    }
}

The lock-block ensures, that code, which is using the same handle (_lock) is never executed at the same time.

Pro tipp: since you are disposing the stream (by using a using-block), you don't neccessarily need to flush it ;-)

Georg
  • 76
  • 6
  • 1
    A Seek() won't be thread-safe either. And when you make this file-op thread-safe you don't need the Seek anymore. – H H Aug 10 '16 at 10:21
  • @georg in essence I never wrote .write and .flush. instead I'm calling a freeware tool inside the using block which does the writing (buffered) and uses those two commands there. I guess they wrote it with .flush to be on the save side in case someone didn't use a using block there. So yeah it is absolutely unnecessary (I know), but I can't remove the flush^^' (as I can't modify that third party freeware tool) – Thomas Aug 10 '16 at 10:28
  • @Thomas got it :-) – Georg Aug 10 '16 at 10:35
  • @Georg does a reaadonly object really work there? (as lock must modify the objet if I'm not mistaken?) – Thomas Aug 10 '16 at 11:40
  • 1
    @Thomas, *"lock must modify the object"*, it does modify some bytes in memory (lock implementation details), but it doesn't change object reference. [Readonly](http://stackoverflow.com/q/277010/1997232) simply tells to compiler (and other programmers) what this field holds reference to instance of same object. It's optional to specify `readonly` for `private` fields (as you have control over it), but it's a good idea to do (to show your intents), e.g. in team development. – Sinatr Aug 10 '16 at 12:32
  • @ Thomas "lock" just consumes the handle ("object ID") which is why it can be a plain object. Same reference -> same lock handle @ Sinatr it is highly recommended to make the instance immutable with readonly to avoid potential errors even in a single developer environment :-) – Georg Aug 10 '16 at 12:41
  • @georg interesting. I never saw read only being used in such situations thus my question there. Itneresting. Will try that out and see how it works. – Thomas Aug 10 '16 at 12:48