Does a single large read tend to be faster or slower than many small reads of same size?
It depends, but generally yes.
On one hand, the number of IO requests SSDs can do per second (see IOPS) is bounded. While this limit is quite big for SSDs compared to HDDs (especially new SSDs), this is often a limitation to read many small files efficiently. Note that current high-performance NVMe SSDs can reach about 300K IOPS. However, the file system needs to perform several IO request per file. Thus, keep in mind that the number of small file read is generally much smaller.
On another hand, reading huge buffers can be slower because the OS usually need to perform internal copies of the buffer and this introduces a significant overhead when the buffer do not fit in CPU caches on high-performance SSDs. Note that this is very dependent of the API used to read the target files (see the second part of this related answer). Most standard libraries for reading/writing files are actually buffered so reading very small chunks is not so slow (it still slower than medium-sized ones because of additional calls/operations). Buffering is only possible if for sequential read/writes though.
Could I improve performance by first writing all files to an "in-memory" FS within my language and writing to disk later in a single batch?
Probably not. This is highly dependent of the OS implementation and the access pattern of your application. If the writes are contiguous, I do not expect a huge speed up on a fast SSD since writing+reading the file in RAM adds an overhead and the cost of managing files is paid twice. Furthermore, note that some OS restrict the size of in-RAM FS and this space could be reserved (it needs to be tuned and often require advanced privileges). Besides, most OS tend to cache read/written files in RAM anyway (but in a significantly more efficient way than an in-RAM FS). This is the case on both Windows and Linux by default.
Same question for reads. Is it faster to load the whole folder into an in-memory FS vs dispatching many small reads interspersed with processing code?
Most OS can cache read/written files in memory also to read them faster later. The OS is responsible for freeing the cache when the memory is requested by processes. You can often hardly control its behaviour. If the files have been already written/read and you have enough free memory and the caching buffers are large enough, then there is no need for in-RAM FS (as it will likely be slower than the cache). If you read files non-sequentially and they tends to be evicted from the cache, then implementing a prefetching strategy using dedicated threads may help to speed up reads. Furthermore, mapped-files and asynchronous low-level APIs may also help to implement that more efficiently in such case.
Note that compacting many small file in a big one can significantly improve performance since it strongly reduce overheads of the FS subsystem (open system calls, close system calls, recursive directory fetches causing many additional IOPS in critical cases, etc.).