0

Flow

I have a reader DLL written in C++.
I also have writer DLL written in some language (not in C++).
DLLs run in the same process synchronously.

  1. Reader DLL calls writer's DLL API, GetData
  2. Writer DLL prepares data, either by downloading it, extracting it, etc.
  3. Reader DLL reads and use the data

Question

What is the recommended way for DLLs to share data?


Approach 1

Reader DLL pass file path argument to Writer DLL and reads the data from file.

Cons

I'd like to avoid writing data to disk. Even if it is the most robust solution I'd like to explore different options since it doesn't seem very elegant to me to write data to disk when you don't need it on disk.


Approach 2

Writer DLL will allocate buffer on the heap and return an address and size to the reader DLL.

Cons

Reader DLL must free the memory. Is it feasible? delete memory by address and size?

Also, it is probably a big NO-NO allocating and freeing buffer across modules/languages


Approach 3

Separate the GetData() to two calls.

  1. Reader DLL Calls GetDataSize()
  2. Reader DLL allocate buffer and pass the address to Writer DLL
  3. Writer DLL fills buffer
  4. Reader DLL use buffer
  5. Reader DLL frees buffer

This is the acceptable WINAPI approach.

Cons

I assume that Writer DLL is capable of knowing the size of the data prior to writing but that is not always the case.


Approach 4

Use windows file mapping

Cons

Similar cons to Approach 2 & 3.

  • Who will create the file mapping?
  • Who will unmap?
  • File mapping has no dynamic size. You must define the size when creating it.
idanshmu
  • 5,061
  • 6
  • 46
  • 92
  • 2
    If the DLLs are in the same process they can share memory directly, you just need to pass memory pointers around. – Jonathan Potter Sep 16 '18 at 08:55
  • 1
    What is the other DLL written in? That that you say isn't written in C++. That might influence the best answer. – selbie Sep 16 '18 at 08:57
  • 1
    2 and 3 are fine. 4 is massive overkill. You can readily use 2 by using a shared heap. LocalAlloc. CoTaskMemAlloc. HeapAlloc. Take your pick. Question seems too broad and opinion based though. – David Heffernan Sep 16 '18 at 09:19
  • 1
    @DavidHeffernan Why too broad? You made me (and perhaps others) aware of `HeapAlloc` & `HeapFree` WINAPIs. It is similar to approach 2 but eliminates its cons. Seems like it the right approach for this case regardless of which language the writer DLL is written – idanshmu Sep 16 '18 at 09:27
  • 1
    [IMalloc](https://learn.microsoft.com/en-us/windows/desktop/api/objidl/nn-objidl-imalloc) provides everything you need to implement approach 2. This is particularly interesting when crossing language/runtime boundaries. – IInspectable Sep 16 '18 at 09:40

2 Answers2

2

Note: as we are talking about passing data between two different languages, I'm going to assume we are talking about "raw" data (primitive types, PODs & co.) that don't need any special treatment on destruction; if this is not the case, please tell me in the comments.

  1. Obviously feasible, but I wouldn't consider it unless desperate. The two dlls live in the same virtual address space, so they can share data straight in memory, without need to go through disk.

  2. Feasible and routinely done; the problem you must generally work around is that often the "default" heap of a given module is private1 so allocating from one and freeing from the other is a big no-no. There are two typical ways to implement this:

    • go through a heap that is surely available to both modules; in Win32, you'll often find LocalAlloc/LocalFree (or other Win32 API-provided heap primitives) used for this, as they are logically "below" all user-mode code, and provide a shared heap available to all the modules in the current process; so, one side knows that it must allocate using LocalAlloc, the other side knows that this data must be deallocated using LocalFree; everything works fine;
    • the allocating module provides also a deallocation function for the memory it allocates; the client code knows that whatever it received allocated by module A, must be freed using the A_free() function. This in turn will probably just wrap your language deallocation function, to be used as a counterpart to the allocations you do in the "business logic" exported functions. By the way, it may be useful to have an A_malloc() as well to mark the allocations that are expected to be freed by A_free() - even though they may be plain malloc/free today, you may be interested in changing this later.
  3. Routinely done as well; often in Win32 APIs there's some special invocation form that allows to retrieve the needed size to allocate; may be cumbersome to use or implement if such size cannot be computed easily without actually trying to do whatever the function has to do, or if such size fluctuates (the Win32 APIs to retrieve processes data come to mind, where you may have to loop keeping increasing allocations in case the data to retrieve is actually increasing between one call and the other).

  4. Can be done, although I've never seen it done for in-process data; the overhead on allocation is going to be bigger than any "regular" heap function, but nothing like writing to file; in general it's more cumbersome than the LocalAlloc/LocalFree solution for no particular gain, so I wouldn't bother.

Personally, I'd go with the option 2 - it's trivial to implement and doesn't require big changes to how you usually would write this stuff in C - the only difference is that you must use a particular pair of allocation/deallocation functions when working with this data.

An extra possibility that comes to mind is to have your function take the allocation function as a callback parameter (and possibly a dellocation function parameter as well, if it's needed for your algorithm - dynamically growing arrays come to mind); it'll be the caller to supply it, so the called DLL will allocate with whatever heap the caller likes most.


Notes

  1. although it can be shared, e.g. if the two modules link dynamically against the same C runtime it probably is; OTOH, if the two modules are written in different languages this is highly unlikely.
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
1

The DLLs all run in the same process and address space. So they can share any data directly in memory. The challenge is only how to give access to the data, especially if you use different languages.

  • Option 1 is easy because you just need to pass the common file name to the reader. But why this overhead ? There's a lighter string variant: If you manage to pass a filename as a string, you could as well let the writer serialise the data in a string and pass it to the reader

  • Option 2 is more delicate. It's ok if memory is allocated/deallocated on the same side of the DLL or if you allocate your buffer using windows API. Otherwise it can be tricky because memory allocation passes the DLL barriers with difficulty (not because of the address space, but because of the risk of using different heaps and different allocation/release routines). Furthermore, you can't be sure that the calling programme manages the C++ object lifecycle properly (if you use some RAII design on C++ side). So, this is an option only if the reader manages the buffer lifecycle:

    • caller asks reader to allocate, then caller provides the writer the address of the buffer, then caller calls reader again to process the data and to release the buffer.
    • fixed size of buffer is acceptable, i.e. size of the daa is known.
  • Option 3 is option 2 done well

  • Option 4 has still the disk overhead if you use mapped file I/O, with an additional question: can you map two times the same file in the same process ? If you'd be tempted by this option, have a second look a the string based variant that I proposed for option 1 above: the shared string plays the role of the memory mapping without the inconvenience of the file.

The string variant seems an easy alternative for jumping over language barriers with complex data structures. Almost every language has strings. The producer can build its string without having to know the size of the data in advance. Finally, even if strings are managed differently across language, there's always a way to access them in read-only.

But my preferred variant would be to organize the whole thing in way that the writer (or the main programme acting as mediator) calls the processing functions of the reader as needed (when parts of the data are available), providing data as arguments of a well defined types to function calls.

Christophe
  • 68,716
  • 7
  • 72
  • 138
  • 1
    Always risky answering poor questions. Some errors in this answer though. Perfectly reasonable to use a shared heap. Make it part of the API. Don't think memory mapping requires that the disk is involved. – David Heffernan Sep 16 '18 at 10:30
  • 1
    Cons to option 2 do not really apply. With [IMalloc](https://learn.microsoft.com/en-us/windows/desktop/api/objidl/nn-objidl-imalloc) none of the issues are. Excluding the issue of misusing the API, which you cannot solve. You have to assume, that all participating parties agree on a protocol. If you do not, then *any* solution is subject to failure. – IInspectable Sep 16 '18 at 11:05
  • @DavidHeffernan thanks for pointing this out. I edited to clarify. I was of course not challenging sharing of heap allocated with system functions, but heap objects created in C++ (which may subdivide heap space allocated from system). I think OP was not referring to memory mapping but file I/O mapping. Are you sure it doesn't uses disk ? Because "*The size of the file mapping object is independent of the size of the file being mapped. However, if the file mapping object is larger than the file, the system expands the file before CreateFileMapping returns*" – Christophe Sep 16 '18 at 11:11
  • @IInspectable As said (I edited in the meantime to clarify), I was not challenging memory allocation using the windows API (which on the C++ side would require a placement new to ensure the proper C++ object lifecycle), but about C++ native allocation that does not match one to one heap objects return by the OS (and from C++ point of view require its object to be destroyed before being deallocated). – Christophe Sep 16 '18 at 11:18
  • 1
    No need for placement new. C++ allows you supply allocators when needed. Writing an allocator using the `IMalloc` interface is trivially simple. – IInspectable Sep 16 '18 at 11:20
  • No idea where placement new comes into this. I think you are missing the point here. – David Heffernan Sep 16 '18 at 11:53
  • I have no idea what you are talking about. Never mind. – David Heffernan Sep 16 '18 at 12:35
  • 1
    No problem: my main concern is to help OP with his/her question. You look very knowledgeable as well. Why not proposing another answer in which you could develop all these arguments ? – Christophe Sep 16 '18 at 15:21