22

When I use GetManifestResourceStream to retrieve an embedded resource from a .NET assembly, what kind of I/O is involved?

I see two possibilities:

  1. The entire assembly was already put into memory when .NET loaded it, so GetManifestResourceStream is just accessing memory.

  2. Only the code parts of the assembly were put into memory when the assembly was loaded by .NET, so GetManifestResourceStream needs to go back to the .dll file to extract the embedded resource.

I'm pretty sure the first is the case, especially since assemblies can be loaded dynamically from raw data with Assembly.Load(Byte[]). But then I wonder what happens if a very large file (say several gigabytes) was embedded - the second option might be more efficient. Does size matter?

Just challenging some long-held assumptions, and not able to find much in the way of reference on this.

Matt Johnson-Pint
  • 230,703
  • 74
  • 448
  • 575
  • 1
    I have no idea, but _gigabytes_ ...? Are you sure there isn't a better method to store and distribute content than as embedded resources? – KristoferA Apr 21 '17 at 17:22
  • 1
    Of course - that's just a hypothetical. I'm thinking that there must be either some limit of largest resource size allowed, or some branching in how something that large would be loaded. – Matt Johnson-Pint Apr 21 '17 at 17:25
  • I know nothing about .NET, but usually modern OS will allow you to virtually map file into memory, then fetch particular parts of it upon access of that memory addresses. So then it is loading only accessed blocks of it, while virtually pretending it is loaded fully in memory, eating in your case "gigabytes" of virtual address space (may hurt on 32b platform! ... on 64b it's probably no big deal). – Ped7g Apr 21 '17 at 17:46
  • 1
    To be clear, I typically have embedded resources on the order of a few kb, or maybe 1MB. I'm just curious about the i/o perf and if something different happens with large files or not. – Matt Johnson-Pint Apr 21 '17 at 17:48
  • @MattJohnson-Pint According to the accepted answer, the statement `I'm pretty sure the first is the case` is actually wrong? – joe Jan 28 '21 at 03:13

1 Answers1

30

"Memory" is not a precise enough term on a demand-paged virtual memory operating system like Windows, Linux, MacOS. The CLR maps the assembly into the address space of the process using a memory-mapped file (MMF). Just numbers to the processor, one each for every 4096 bytes. Nothing is read from the file just yet.

That is delayed until the program tries to read from an address inside the address space. First access generates a page fault, the kernel allocates RAM for the page and fills it with the file content. After which the program resumes as though nothing happened. Strongly empowers the "you don't pay for what you don't use" advantage of virtual memory.

There is no "extraction", you are reading the resource data directly from memory, most efficient way it could have been implemented. An embedded resource does not otherwise behave any differently from other data in the file, like the metadata and the MSIL. You likewise don't pay for any code in the assembly that you never call.

Do keep in mind that an embedded resource occupies the same OS resource as the GC heap, it too requires address space. Only real difference is that GC heap address space is backed by the OS paging file and can never be shared with other processes, the assembly data is backed by the assembly file and can be shared. Large resources notably shrink the amount of memory you can allocate in a .NET program, even if you never use them. That matters only in a 32-bit process, a 64-bit process has many terabytes of address space.

Another restriction is that an MMF view can never be larger than 2 GB, even in a 64-bit process, that sets a hard upper limit on the maximum size of a resource. That usually keels over very early, failing the build with CS1566, "Specified argument was out of the range of valid values". Not a great diagnostic btw.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • So an embedded resource will just sit in the heap at all times during a programs runtime then? I would have thought it was compiled into the executable, then whenever the resource is needed it references that memory from disk and copies it out, though I guess that wouldn't be any more efficient than reading a file from disk...I suppose I always thought a resource was used to save yourself from having to reference and manage files that were referenced often (A logo for example), not really as a performance booster. – Trevor Hart Apr 21 '17 at 18:10
  • No, heap has nothing to do with it. It merely occupies address space. Advantage of the MMF is that you don't have to explicitly read it from the file, doesn't require space in the paging file, automatically gets cached so any subsequent access is very cheap and can simply be discarded when other processes need RAM. The advantages of a demand-paged virtual memory operating system, it uses all of them. – Hans Passant Apr 21 '17 at 18:21