Look at the flow of data in your application, and then look at the data rates that your (I assume shared) disk system provides and the rate your GigE interconnect provides, and the topology of your cluster. Which of these is a bottleneck?
GigE provides theoretical maximum 125 MB/s transmission rate between nodes - thus 4GB will take ~30s to move 100 40MB chunks of data into your central node from the 100 processing nodes over GigE.
A file system shared between all your nodes provides an alternative to over-Ethernet RAM to RAM data transfer.
If your shared file system is fast at the disk read/write level (say: a bunch of many-disk RAID 0 or RAID 10 arrays aggregated into a Lustre F/S or some such) and it uses 20Gb/s or 40 Gb/s interconnect btwn block storage and nodes, then 100 nodes each writing a 40MB file to disk and the central node reading those 100 files may be faster than transferring the 100 40 MB chunks over the GigE node to node interconnect.
But if your shared file system is a RAID 5 or 6 array exported to the nodes via NFS over GigE Ethernet, that will be slower than RAM to RAM transfer via GigE using RPC or MPI because you have to write and read the disks over GigE anyway.
So, there have been some good answers and discussion or your question. But we do (did) not know your node interconnect speed, and we do not know how your disk is set up (shared disk, or one disk per node), or whether shared disk has it's own interconnect and what speed that is.
Node interconnect speed is now known. It is no longer a free variable.
Disk set up (shared/not-shared) is unknown, thus a free variable.
Disk interconnect (assuming shared disk) is unknown, thus another free variable.
How much RAM does your central node have is unknown (can it hold 4GB data in RAM?) thus is a free variable.
If everything including shared disk uses the same GigE interconnect then it is safe to say that 100 nodes each writing a 40MB file to disk and then the central node reading 100 40MB files from disk is the slowest way to go. Unless your central node cannot allocate 4GB RAM without swapping, in which case things probably get complicated.
If your shared disk is high performance it may be the case that it is faster for 100 nodes to each write a 40MB file, and for the central node to read 100 40MB files.