0

I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.

Is my code have a problem in other place and mutex is not really required (as suggested by @evan ) or mutex is necessary here

void *DiskWorker(void *threadarg) {

FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
            //pthread_mutex_lock (&mutexsum);
            // For Random access

            fseek ( theFile , randomArray[i] * chunkSize  , SEEK_SET );
            fputs ( data , theFile );

            //Or for sequential access (in this case above 2 lines would not be here)

            fprintf(theFile, "%s", data);
            //sequential access end

            fflush (theFile);
            //pthread_mutex_unlock(&mutexsum);
        }
.....
}
sapy
  • 8,952
  • 7
  • 49
  • 60

2 Answers2

2

You are opening a file using "append mode". According to C11:

Opening a file with append mode ('a' as the first character in the mode argument) causes all subsequent writes to the file to be forced to the then current end-of-file, regardless of intervening calls to the fseek function.

C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.

I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:

Appended output redirection shall cause the file whose name results from the expansion of word to be opened for output on the designated file descriptor. The file is opened as if the open() function as defined in the System Interfaces volume of POSIX.1-2008 was called with the O_APPEND flag. If the file does not exist, it shall be created.

And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.

So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

Note that not all filesystems support this behavior. Check this answer.


As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).

Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).

1

You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.

So you need the mutex.

In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.

So it might be a good idea to benchmark that.

Galik
  • 47,303
  • 4
  • 80
  • 117
  • If I put mutex outside loop will not that be essentially sequential instead of multithreading ? – sapy Mar 20 '18 at 06:31
  • @sapy As far as the disk transfers go yes. But you will still be off-setting them against whatever other processing you may be doing. I just see a potential for this to make things slower so I'm just suggesting you test. – Galik Mar 20 '18 at 06:35
  • assuming the offsets are random , `randomArray[i] * chunkSize` taking a hint it shouldn't make difference. Also if the file is not huge look at available memory for that then most io can be buffered and FS will take care of sequential access. just re-evaluate the need for flushing after every write. – amritanshu Mar 20 '18 at 08:29