2

There is a line in my code where I am doing this-

int sockDesc = socket(AF_INET, SOCK_DGRAM, 0);

Earlier it was working like a charm, but then suddenly one day the function returned something less than zero. So I examined standard error and found Too many open files.

I realised that what I have done is, exited the program using ctrl + c many times. so may be the sockets are somehow still open and I need to do something about it (apart from writing a signal handler of course), like - increasing some limits in the /etc/sysctl.conf file and all.

But that is wrong right? When I exited the program, won't linux automatically clean up after me ?

Just to confirm this is not some issue caused by stuff let opened by me all over the RAM, I rebooted- Still the same error !

What is going on here ? But wait a minute. I had already closed my program and restarted it. Why must I get such an error in my system? What is a way to correctly diagnose this? Is this really about sockets or any other type of open file descriptors? What should be my next step to solve this issue?

EDIT 1:

I ran another small program:

int main()
{
        int sd;
        sd = socket(AF_INET, SOCK_DGRAM, 0);
        if(sd < 0)
        {
                perror("Opening datagram socket error");
                return 1;
        }
        else
        {
                printf("Opening datagram socket....OK.\n");
        }

        close(sd);
        return true;
}

Which ran with no issues. Now I really do not understand what could be the issue. The exact same code is not working when it is running from my actual code base.

Also, the output of lsof has only 236 lines in it for my user, which means that I a well below the soft limit of 1024.

EDIT 2

Here is some of the code explanation to explain how I am doing things

There is a main file that creates multiple threads, each one being a network connection to a multicast stream. The beginning of each such thread is the line int sockDesc = socket(AF_INET, SOCK_DGRAM, 0);. This is where it fails. For some reason it says "too many open files" and the socket is not created.

Chani
  • 5,055
  • 15
  • 57
  • 92
  • 4
    You should use [lsof](http://unixhelp.ed.ac.uk/CGI/man-cgi?lsof+8) to find out which process has what open files. – Florin Stingaciu Jul 28 '14 at 15:34
  • Can other programs open sockets? – Eric Finn Jul 28 '14 at 15:49
  • 3
    The operating system doesn't clean up immediately, but eventually it will. Minutes, not days, and a socket is just a data structure, so rebooting certainly resets them. – John C Jul 28 '14 at 15:53
  • You may use `SO_REUSEADDR`, see [this question](http://stackoverflow.com/q/775638/841108) – Basile Starynkevitch Jul 28 '14 at 18:14
  • @BasileStarynkevitch I have already done that in the `setsockopt` function call. – Chani Jul 28 '14 at 18:21
  • 1
    You need to provide more information for a solid answer. Just doing SO_REUSEADDR is not directly addressing the problem. Let's find out WHY this is happening to you instead of just trying to stop it, and not really correcting the root of the problem. First of all, what is your code project supposed to do? Can you provide a link to source code? – bazz Jul 28 '14 at 19:32
  • @bazz it basically reads a lot of data from multicast streams and processes it. The data is received in an infinite loop. I was simply pressing ctrl + c to end the program. I earlier thought that not closing the socket descriptors might be causing the issue but after i reboot the machine and tried again, i knew this was about something else. Also, the error comes in the first few lines of the code. Thats why i did not post any code. – Chani Jul 28 '14 at 19:38
  • Also, sadly the server has been shut down until tomorrow morning and I cannot try the lsof thing till then. I also want to try opening a socket thru the command line and another program. – Chani Jul 28 '14 at 19:41
  • you should post the code. – bazz Jul 29 '14 at 00:48
  • I have updated the question as asked. – Chani Jul 29 '14 at 07:33
  • 3
    So you show code that doesn't reproduce your problem, and talk only of your user's file count, when there are system-wide limits too... can't really help you, there's no real info to go by. – Mat Jul 29 '14 at 07:49
  • @Mat What would you need ? – Chani Jul 29 '14 at 07:51
  • 2
    Code that actually produces the problem. Contents of /proc/sys/fs/file-max when the problem happens. Your analysis of what other processes have large number of files open when the problem happens. – Mat Jul 29 '14 at 08:03
  • @Mat. I know it is only *one line*, but trust me that's the *only* thing that is producing the problem. However, I have updated the question a little more to describe another aspect of the code base. Also, contents of /proc/sys/fs/file-max is 100000. Also, like I said, when I write another small program to test opening a socket, itt works. – Chani Jul 29 '14 at 08:07
  • 1
    So you have one program that works and one that doesn't. You've only show the code for the one that does work. How can we help you fix the one that doesn't? Think about it for a while. That line that fails is the symptom. You need to find the cause. You also hadn't mentioned threads until now, which is kind of important. See http://stackoverflow.com/help/mcve. – Mat Jul 29 '14 at 08:21
  • @JohnC The operating system should release resources allocated by a process immediately the process terminates. Never seen one that didn't, apart from the pathological ones that didn't release them at all, or crashed instead. – user207421 Jul 29 '14 at 08:25
  • @Wilding, it looks like you never received help because you did not want to post your original source code. Or, you fixed your problem and left this question in the dust. I encourage you to post your own solution and the original source code. – bazz Sep 07 '14 at 03:51
  • @bazz YOu are right. I am adding an answer. – Chani Sep 10 '14 at 06:55

1 Answers1

2

I used lsof command as suggested in the comments. Was able to look at what was going wrong that way - I had written a faulty constructor for a class that is instantiated a gazillion times. A rogue file descriptor was getting created and hence the error.

Chani
  • 5,055
  • 15
  • 57
  • 92