Efficient usage of File Descriptors in C/C++

Question

I'm using C++ to backup files from my Clients Systems. Each and every second, the files are flooding from my Client's Machines and i have to get file descriptors for every file to write in server. Sometimes i get even 10K files within a min, so how can i actually make use of file descriptors to write multiple files efficiently.

I have a C++ Socket Listener, each Client machine connects to the Server and starts uploading files. Is it possible to write multiple files with limited file descriptors. I have tried writing all the files into a single one large file, but i have keep track of the file's end and start bytes in the large file for each and every single file i write in to it.That would be some heck of a work.

I want to be able to write files with high performance and also keep the disk safe. Any ideas to share ..?

If a bottleneck is a number of file descriptors per process, you can try to a way to increase a limit of open file descriptors - 10K of simultaneously open file descriptors is not really too much. — Maksim Skurydzin, Oct 16 '12 at 13:53
Each client is only going to be able to send one file at a time. It seems like you would close the first file before opening the second, so you would only be using one additional file descriptor per client. — Vaughn Cato, Oct 16 '12 at 14:01
@VaughnCato yes, Sometimes we hit that number of clients. On an average around 2K clients are connecting simultaneously — Manikandaraj Srinivasan, Oct 16 '12 at 15:23
Along the lines of what sehe is describing. Try keeping each file in memory until it is complete, and then open/write/close as a single operation. — Vaughn Cato, Oct 16 '12 at 15:43

score 1 · Accepted Answer · edited May 23 '17 at 12:04

It seems you should probably queue the requests.

That way you can have a limited number of workers (say, 16, depending on the hardware on your system) to actually transfer files, and keep the rest waiting.

This would likely improve the throughput and performance since

it removes the thread scheduling overhead
it removes the resource bottleneck (fd + buffer allocations)
it doesn't hit the disks concurrently (on old fashioned hardware (i.e. spinning disks most often) this can lead to major performance degradation if you cause repeated disk seeks.
- disclaimer: your filesystem/OS may work together to limit/reduce this effect by rescheduling disk writes (elevator/complete fair queuing and other algorithms).
- regardless, when writing sequential uploads to shared volumes, usually the bandwidth usage will be optimal when
```
sum(upload rates) ~= effective disk write bandwitdh
```

On the queueing:

look at your operating system's backlog parameter to the listen function
look at producer/consumer implementations

See also consumer/producer in c++ or other search engines

Efficient usage of File Descriptors in C/C++

1 Answers1