Am I correct to assume that when a process calls malloc, there may be I/O involved (swapping caches out etc) to make memory available, which in turn implies it can block considerable time? Thus, shouldn't we have two versions of malloc in linux, one say "fast_malloc" which is suitable for obtaining smaller chunks & guaranteed not to block (but may of course still fail with OUT_OF_MEMORY) and another async_malloc where we could ask for arbitrary-size space but require a callback?
Example: if I need a smaller chunk of memory to make room for an item in the linked-list, I may prefer the traditional inline malloc knowing the OS should be able to satisfy it 99.999% of the time or just fail. Another example: if I'm a DB server trying to allocate a sizable chunk to put indexes in it I may opt for the async_malloc and deal with the "callback complexity".
The reason I brought this up is that I'm looking to create highly concurrent servers handling hundreds of thousands of web requests per second and generally avoid threads for handling the requests. Put another way, anytime I/O occurs I want it to be asynchronous (say libevent based). Unfortunately I'm realizing most C APIs lack proper support for concurrent use. For example, the ubiquitous MySQL C library is entirely blocking, and that's just one library my servers use extensively. Again, I can always simulate non-blocking by offloading to another thread but that's nowhere near as cheap as waiting for result via completion callback.