0

I'm currently trying to get my head around memory mapped files and potentially implement them in my application.

The application uses a series of large input files to find a location - each file is accessed sequentially when the application is being used on a single thread, I guess this strays into more random access territory - which makes MMF's seem worthwhile.

I'm a little confused about the usage however - should I be newing up a MMF in each thread? I know they can share the same underlying file so it seems like I'd want the same one across all threads and then just to create a view on each thread into that MMF.

If I should only be creating one for each file is there a way to test if the MMF is already created - via the name it's been assigned or some other means in order to prevent attempting to open multiple maps to the same file or would I need to know what files are going to be used in the thread and pass in the already created instances to prevent duplicates?

Cheers.

Joshua Mee
  • 582
  • 5
  • 20

2 Answers2

2

If each thread is dedicated to one file, then it might make sense to have each thread create its own MMF for the one file it is working on. Resources that are only used by a single thread are easier to allocate and destroy within the thread.

However, if all the threads are reading from the same file, then you don't want to create multiple MMFs, because all that will do is multiply the amount of memory consumed and create coherency issues (multiple views of the same section of the file).

For multiple threads operating on the same file, you should create the MMF once and share the MMF pointer with the multiple threads.

Allocating on demand in a multithreaded situation gets complicated fast, and usually boils down to requiring a lock around every access to the protected resource. Requiring a lock can quickly defeat any performance advantage of running multiple independent threads, if they all have to line up and wait for access to the shared resource.

If you can allocate the shared resource prior to constructing/starting the threads, then you often don't need to lock around accessing the resource because the resource is always present by the time the threads need it.

So, I'd consider allocating the MMF before the threads spin up and sharing the MMF pointer across all the threads without locks.

This also assumes that the file is strictly read only - that the multiple threads are never going to write back to the file or MMF. Multiple threads can share a pointer to a common memory area / MMF for read only access without any thread concurrency issues.

Be careful of your assumptions about MMF performance compared to traditional buffered file access. If your entire file data fits comfortably in available RAM, then MMF can be more performant for random access patterns than buffered file I/O. If the file data is much larger than available RAM, then buffered file I/O can be more performant for random access than using an MMF. Why? Because MMFs are piggish about memory use. MMFs can only load data in 4k page size chunks. Buffered file I/O can be more finely tuned to your actual data size needs and patterns. If your app loads 512 bytes of data from 100 different widely separated locations in the file, MMF will have to load 4k * 100 = 400k bytes of data even though you only need 512 * 100 = 50k of data. In this data access pattern / use case, MMF requires 10 times more data transfer and memory consumption than traditional file I/O.

The main attraction of MMF is more often developer convenience rather than raw performance. Reading from a pointer backed by MMF is usually more convenient for the developer than writing and tuning a block-oriented file I/O subsystem. There's nothing wrong with using a technique because it's simple and convenient to the developer, as long as you acknowledge that truth.

dthorpe
  • 35,318
  • 5
  • 75
  • 119
  • Thanks for the thorough answer! I'm dealing purely with reading files so that's not an issue and I've been asked to look at the impact MMF's have in my particular instance so I'll try and implement it as much for understanding and curiosity as anything. Apologies if I'm misunderstanding, you say I'm best passing a pointer, the MSDN article uses "OpenExisting" on the name given when it's created - is there no way to test if a MMF with that name exists? I don't know which files I'll be using until I'm already into the thread as it's currently written which is too late. – Joshua Mee Mar 06 '13 at 17:09
  • Sorry for the pointer reference - old Win32 native code habits die hard. ;> – dthorpe Mar 06 '13 at 21:00
  • 1
    What I suggest is if all the threads will be using the same file and only reading the data, then you can probably use just one MMF instance across all the threads. Call MMF.CreateFromFile once before the threads are started, and share that MMF object instance with all the threads - stuff the MMF object in a static variable that the threads can reach, for example. If you don't plan to use multiple views of the MMF, then you could probably also share one view across all the threads. – dthorpe Mar 06 '13 at 21:04
  • 1
    The reason you need to be careful about constructing multiple instances of MMFs for the same file on disk is to keep memory consumption under control. One MMF.CreateFromFile plus another MMF.CreateFromFile for the same file doubles the memory consumption. One MMF.CreateFromFile plus one MMF.OpenExisting only puts one copy of the file in memory. The two MMF instances will share the memory pages mapped to the file. Multiple views created off the same MMF don't adversely affect memory consumption. – dthorpe Mar 06 '13 at 21:09
  • If you don't know what file you will be operating on until after the threads are up and running, then you will need to set up a system so that one thread is responsible for constructing the initial MMF using CreateFromFile, and subsequent threads will construct their MMF using OpenExisting. – dthorpe Mar 06 '13 at 21:12
  • Yes, you can tell if a MMF with a particular map name doesn't already exist. You call MMF.OpenExisting. If it throws a FileNotFound exception, the map name doesn't exist. – dthorpe Mar 06 '13 at 21:14
  • 1
    Thanks a bunch, couldn't have hoped for a more thorough answer! – Joshua Mee Mar 07 '13 at 09:59
0

Threads of a process always share the same address-space, meaning that every thread has access to objects and resources which are "global" for the entire process.

You will need to synchronize access to the file in your process. It doesn't make much sense to reopen the file in every thread, especially when talking about "larger" amounts (several MB). The MSDN offers an article on your topic, hope it helps.

bash.d
  • 13,029
  • 3
  • 29
  • 42
  • The MSDN article uses "OpenExisting" - this works if you know the file already exists, in my case I don't know what ones I'll be needing. Is there any way to test if the file has already been opened? – Joshua Mee Mar 06 '13 at 16:38
  • Regarding this, you'll need to check it using FileStream.Open(). [Here](http://stackoverflow.com/questions/876473/is-there-a-way-to-check-if-a-file-is-in-use) is an entry from SO. – bash.d Mar 06 '13 at 16:43