I've gotten ideas for multiple projects recently that all involve reading IP addresses from a file. Since they are all supposed to be able to handle a large amount of hosts, I've attempted to implement multi-threading or creating a pool of sockets and select()-ing from them in order to achieve some form of concurrency for better performance. On multiple occasions, reading from the file seems to be the bottleneck in enhancing performance. The way I understand it, reading from a file with fgets or similar is a synchronous, blocking operation. So even if I successfully implemented a client that connects to multiple hosts asynchronously, the operation would still be synchronous because I can only read one address at a time from a file.
/* partially pseudo code */
/* getaddrinfo() stuff here */
while(fgets(ip, sizeof(ip), file) {
FD_ZERO(&readfds);
/* create n sockets here in a for loop */
for (i = 0; i < socket_num; i++) {
if (newfd > fd[i]) newfd = fd[i];
FD_SET(fd[i], &readfds);
}
/* here's where I think I should connect n sockets to n addresses from file
* but I'm only getting one IP at a time from file, so I'm not sure how to connect to
* n addresses at once with fgets
*/
for (j = 0; j < socket_num; j++) {
if ((connect(socket, ai->ai_addr, ai->ai_addrlen)) == -1)
// error
else {
freeaddrinfo(ai);
FD_SET(socket, &master);
fdmax = socket;
if (select(socket+1, &master, NULL, NULL, &tv) == -1);
// error
if ((recvd = read(socket, banner, RECVD)) <= 0)
// error
if (FD_ISSET(socket, &master))
// print success
}
/* clear sets and close sockets and stuff */
}
I've pointed out my issues with comments, but just to clarify: I'm not sure how to perform asynchronous I/O operations on multiple target servers read from a file, since reading entries from file seems to be strictly synchronous. I've run into similar isssues with multithreading, with a marginally better degree of success.
void *function_passed_to_pthread_create(void *opts)
{
while(fgets(ip_addr, sizeof(ip_addr), opts->file) {
/* speak to ip_addr and get response */
}
}
main()
{
/* necessary stuff */
for (i = 0; i < thread_num; i++) {
pthread_create(&tasks, NULL, above_function, opts)
}
for (j = 0; j < thread_num; j++)
/* join threads */
return 0;
}
This seems to work, but since multiple threads are all processing the same file the results aren't always accurate. I imagine it's because multiple threads may process the same address from file at the same time.
I've considered loading all the entries from a file into an array/into memory, but if the file was particularly large I imagine that could cause memory issues. On top of that, I'm not sure it that even makes sense to do anyway.
As a final note; if the file I'm reading from happens to be a particularly large file with a huge amount of IPs then I do not believe either solution scales well. Anything is possible with C though, so I imagine there is some way to achieve what I'm hoping to.
To sum this post up; I'd like to find a way to improve a client-side applications performance using asynchronous I/O or multi-threading when reading entries from a file.