7

I am writing the code in c++. Can I run into any kind of race conditions or seg-faults?

pingul
  • 3,351
  • 3
  • 25
  • 43
Invictus
  • 2,653
  • 8
  • 31
  • 50
  • 2
    If you need non-sequential concurrent file access I would recommend using a memory mapped file instead. Then just treat it like normal memory and do your own locking (which isn't needed if you can guarantee the write/read locations don't overlap). – edA-qa mort-ora-y Sep 27 '11 at 07:43

4 Answers4

7

There's no problem doing that from the point of view of the underlying system (for all systems I know). However, typically you would need to have completely separate file descriptors/handles. This is because the file descriptor maintains state, e.g. the current file position.

You also need to check the thread-safety of the particular C++ interface to the filesystem that you are using. This is needed in addition to the thread-safety of the underlying filesystem.

You should also consider the possibility that threaded I/O will be slower. The system may have to serialise access to the bus. You may get better performance from overlapped I/O or a dedicated I/O thread fed through a producer/consumer pipeline.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • In each of the thread I am opening the same file and then writing into the file according to the position which is different for every thread. So, I can go ahead and implement right??? – Invictus Sep 27 '11 at 06:59
  • I could not say for sure. You have not showed your code. But so long as you have different handles you should be fine. – David Heffernan Sep 27 '11 at 07:10
4

Another solution, depending on the size of the file and the system you're running on, is to use memory mapped files, ie. mapping the file into virtual memory. This would give you direct access to the file as if it was a piece of memory. This way, any number of threads can simply write to the memory region and subsequent calls to flush the mapping to disk (depending on your configuration of the memory mapping) will simply store the data on disk.

Do note, that due to adressing restrictions on 32 bit platforms, it will not be possible for you to map any file larger than usually 2-3 GB, depending on the architecture and the actual number of bits available to do virtual memory adressing. Most 64 bit systems have 48 bits or more available for this task, allowing you to map at least 256 TB, which I'd take it is more than sufficient.

Nick Bruun
  • 115
  • 1
  • 6
  • Yes. Virtually all modern UNIX'es support memory mapping through the `mmap` function. The _man_ page gives a lot more detail on it, which in your case would be [like this](http://nixdoc.net/man-pages/FreeBSD/mmap.2.html). Generally, the steps needed are simply to open a file descriptor, preallocate the size of it if necessary (manually or through fallocate or posix_fallocate) and then map it to memory and you're good to go. – Nick Bruun Sep 27 '11 at 13:32
  • But, the problem here is that I using a hash table to store the incoming data, and, I want to write this into the output file... So, is there any way??? – Invictus Sep 27 '11 at 22:57
2

It depends. Files are not their handles, and streams are not files. This three different concept must be clear.

Now, the operating system can open the file more times returning different handles, each of which have its own "position pointer". If the file is opened in "share mode" for both reading and writing, you can seek the handles where you want and read/write as you like. The fact you don't overwrite depends on you. The system grants the sequentiality of the operations either for the entire file or for part of it (but more information on the operating system are required)

If every handle is attached to a different stream, each stream will write independently of the other. But -in this case- there is the complication of "buffering" (writing can be delayed and read can be anticipated: and can be longer as the one you ask: make sure you manage eventual overlap properly by flushing as appropriate)

Emilio Garavaglia
  • 20,229
  • 2
  • 46
  • 63
  • In each of the thread I am opening the same file and then writing into the file according to the position which is different for every thread. So, I can go ahead and implement right??? – Invictus Sep 27 '11 at 07:00
  • @Invictus: You must open the file for sharing. Everything else should work. – Emilio Garavaglia Sep 27 '11 at 14:02
1

Sure you can. Race condition may occur depending on how you are writing actual code(ie using that file). Also, if IO is buffered strange things may appear if buffered regions overlap.

GreenScape
  • 7,191
  • 2
  • 34
  • 64