1

I have to make simultaneous tcp socket connections every x seconds to multiple machines, in order to get something like a status update packet.

I use a Callable thread class, which creates a future task that connects to each machine, sends a query packet, and receives a reply which is returned to the main thread that creates all the callable objects.

My socket connection class is :

public class ClientConnect implements Callable<String> {
    Connection con = null;
    Statement st = null;
    ResultSet rs = null;
    String hostipp, hostnamee; 
    ClientConnect(String hostname, String hostip) {
        hostnamee=hostname;
        hostipp = hostip;
    }
    @Override
    public String call() throws Exception {
        return GetData();
    }
    private String GetData()  {
            Socket so = new Socket();
            SocketAddress sa =  null;
            PrintWriter out = null;
            BufferedReader in = null;
        try {
            sa = new InetSocketAddress(InetAddress.getByName(hostipp), 2223);
        } catch (UnknownHostException e1) {
            e1.printStackTrace();
        }
        try {
            so.connect(sa, 10000);

            out = new PrintWriter(so.getOutputStream(), true);
            out.println("\1IDC_UPDATE\1");
            in = new BufferedReader(new InputStreamReader(so.getInputStream()));
            String [] response = in.readLine().split("\1");             
            out.close();in.close();so.close(); so = null;

            try{
                Integer.parseInt(response[2]);
            } catch(NumberFormatException e) {
                System.out.println("Number format exception");
                return hostnamee + "|-1" ;
            }

            return hostnamee + "|" + response[2];
        } catch (IOException e) {
            try {
                if(out!=null)out.close();
                if(in!=null)in.close();
                so.close();so = null;
                return hostnamee + "|-1" ;
            } catch (IOException e1) {
                // TODO Auto-generated catch block
                return hostnamee + "|-1" ;
            }
        }
    }
}

And this is the way i create a pool of threads in my main class :

private void StartThreadPool()
{
    ExecutorService pool = Executors.newFixedThreadPool(30);
    List<Future<String>> list = new ArrayList<Future<String>>();
    for (Map.Entry<String, String> entry : pc_nameip.entrySet()) 
    {
        Callable<String> worker = new ClientConnect(entry.getKey(),entry.getValue());
        Future<String> submit = pool.submit(worker);
        list.add(submit);
    }
    for (Future<String> future : list) {
        try {
            String threadresult;
            threadresult = future.get();
            //........ PROCESS DATA HERE!..........//
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (ExecutionException e) {
            e.printStackTrace();
        }
    }       
}

The pc_nameip map contains (hostname, hostip) values and for every entry i create a ClientConnect thread object.

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.

If i force the list to contain a single working pc, I have no problem. The timeouts are pretty random, no clue what's causing them.

All machines are in a local network, the remote servers are written by my also (in C/C++) and been working in another setup for more than 2 years without any problems.

Am i missing something or could it be an os network restriction problem? I am testing this code on windows xp sp3. Thanks in advance!



UPDATE:

After creating two new server machines, and keeping one that was getting a lot of timeouts, i have the following results :

For 100 thread runs over 20 minutes :

NEW_SERVER1 : 99 successful connections/ 1 timeouts
NEW_SERVER2 : 94 successful connections/ 6 timeouts
OLD_SERVER  : 57 successful connections/ 43 timeouts

Other info : - I experienced a JRE crash (EXCEPTION_ACCESS_VIOLATION (0xc0000005)) once and had to restart the application. - I noticed that while the app was running my network connection was struggling as i was browsing the internet. I have no idea if this is expected but i think my having at MAX 15 threads is not that much.

So, fisrt of all my old servers had some kind of problem. No idea what that was, since my new servers were created from the same OS image.

Secondly, although the timeout percentage has dropped dramatically, i still think it is uncommon to get even one timeout in a small LAN like ours. But this could be a server's application part problem.

Finally my point of view is that, apart from the old server's problem (i still cannot beleive i lost so much time with that!), there must be either a server app bug, or a JDK related bug (since i experienced that JRE crash).

p.s. I use Eclipse as IDE and my JRE is the latest.

If any of the above ring any bells to you, please comment. Thank you.

-----EDIT-----

Could it be that PrintWriter and/or BufferedReader are not actually thread safe????!!!?

----NEW EDIT 09 Sep 2013----

After re-reading all the comments and thanks to @Gray and his comment :

When you run multiple servers does the first couple work and the rest of them timeout? Might be interesting to put a small sleep in your fork loop (like 10 or 100ms) to see if it works that way.

I rearanged the tree list of the hosts/ip's and got some really strange results. It seems that if an alive host is placed on top of the tree list, thus being first to start a socket connection, has no problem connecting and receiving packets without any delay or timeout.

On the contrary, if an alive host is placed at the bottom of the list, with several dead hosts before it, it just takes too long to connect and with my previous timeout of 10 secs it failed to connect. But after changing the timeout to 60 seconds (thanks to @EJP) i realised that no timeouts are occuring!

It just takes too long to connect (more than 20 seconds in some occasions). Something is blobking new socket connections, and it isn't that the hosts or network is to busy to respond.

I have some debug data here, if you would like to take a look : http://pastebin.com/2m8jDwKL

ktsangop
  • 1,013
  • 2
  • 16
  • 29
  • I don't see any immediate bugs in your code. Any chance you are trying to connect to the same server over and over? Do you have a sense that the timeouts are happening after 10 seconds have past? Are the timeouts at connect time or during the IO? – Gray Sep 04 '13 at 13:59
  • Each threaded connection is given 10 seconds to timeout and no other connection to the same server is started in the meantime. Timeouts are normal, occuring exactly 10 seconds after the connection attempt. I ommited the logging to keep my post short. All timeouts are at connect(). – ktsangop Sep 04 '13 at 14:05
  • When you run multiple servers does the first couple work and the rest of them timeout? Might be interesting to put a small sleep in your fork loop (like 10 or 100ms) to see if it works that way. – Gray Sep 04 '13 at 14:09
  • It doesn't seem that the order in which i access the servers has any importance. I tried adding a small sleep of 50ms, it didn't have any effect. Thanks for the suggestion. – ktsangop Sep 04 '13 at 14:30
  • When you run multiple servers does the first couple work and the rest of them timeout? Some of the threads work but others time out? Can you try your program on a different architecture? – Gray Sep 04 '13 at 14:32
  • When my list of servers contains only one that is alive and accepting connections, my code works as expected. When my list contains 10 servers of which 1 is only alive, i get an expected timeout for the 9 dead ones, and an unexpected timeout for the one alive server. – ktsangop Sep 04 '13 at 14:39
  • This seems to work for me fine under Mac OSX. I put 1-3 local hosts into the map and then a bunch of invalid IPs that just hang. The 3 local hosts return quickly and then after 10 seconds the others time out. Very repeatable. Is it possible that you have a windows firewall screwing something up? Here my verion of your program: http://pastebin.com/2wzz1VGh – Gray Sep 04 '13 at 22:50
  • Thank you for taking the time to test this. Since you and everyone else reading it cannot find anything wrong i give up...! I 'll have to set up another dummy server app to see the problem is there. Thank you again for your effort. I will update my post as soon as i trace the problem. p.s. firewalls are off. – ktsangop Sep 05 '13 at 08:38
  • What do you mean by 'I get a lot of timeout exceptions ... *even though* my timeout limit is set to 10 seconds'? That's about 1/6 of the default. The shorter you set it, the *more* timeouts you will get. – user207421 Sep 06 '13 at 12:53
  • @EJP I suppose that in a LAN with 10 working computers max, there shouldn't be needed that high timeout value. Correct me if i am wrong. – ktsangop Sep 06 '13 at 13:39

3 Answers3

1

You could simply check for availability before you connect to the socket. There is an answer who provides some kind of hackish workaround https://stackoverflow.com/a/10145643/1809463

Process p1 = java.lang.Runtime.getRuntime().exec("ping -c 1 " + ip);
int returnVal = p1.waitFor();
boolean reachable = (returnVal==0);

by jayunit100

It should work on unix and windows, since ping is a common program.

Community
  • 1
  • 1
mike
  • 4,929
  • 4
  • 40
  • 80
  • I could try this, but i don't think that this relates to my question. It could work but it wouldn't secure me from future unpredicted behaviour like this. I would like a suggestion on why it happens first, and then try to solve it. Thanks – ktsangop Sep 04 '13 at 13:44
  • Then you need to provide more information. For example on your LAN. If a server is reachable, no timeout should occur. You should try to narrow down the problem and try to create the timeout on purpose. It will give some insight to the cause. – mike Sep 04 '13 at 13:52
  • OK i will, but first i would like to know if there is any problem with my code. Do you see anything that could cause such behaviour? – ktsangop Sep 04 '13 at 13:58
  • Sry, I don't see anything related to your problem. If a host is available, there should be no timeout. ...The only problem I have, is that I find your way of using so much cascaded try-catches a bit confusing, but that has nothing to do with the timeout. – mike Sep 04 '13 at 14:26
  • I accepted your answer, because nothing else worked. I know it's a hack but using this method i can guarantee a small response time and no timeouts at the same time. Thank you again. – ktsangop Sep 10 '13 at 12:19
0

My problem is that when my list of machines contains lets say 10 pcs (which most of them are not alive), i get a lot of timeout exceptions (in alive pcs) even though my timeout limit is set to 10 seconds.

So as I understand the problem, if you have (for example) 10 PCs in your map and 1 is alive and the other 9 are not online, all 10 connections time out. If you just put the 1 alive PC in the map, it shows up as fine.

This points to some sort of concurrency problem but I can't see it. I would have thought that there was some sort of shared data that was not being locked or something. I see your test code is using Statement and ResultSet. Maybe there is a database connection that is being shared without locking or something? Can you try just returning the result string and printing it out?

Less likely is some sort of network or firewall configuration but the idea that one failed connection would cause another to fail is just strange. Maybe try running your program on one of the servers or from another computer?

If I try your test code, it seems to work fine. Here's the source code for my test class. It has no problems contacting a combination of online and offline hosts.

Lastly some quick comments about your code:

  • You should close the streams, readers, and sockets in a finally block. Check my test class for a better pattern there.
  • You should return a small Result class instead of passing back a String that they has to be parsed.

Hope this helps.

Gray
  • 115,027
  • 24
  • 293
  • 354
  • You got the point. The mysql variables are used by another function of the class which is executed only once at start up. Thanks for the suggestions. Right now i am setting up two new machines from scratch, to test them as servers. Will post some feedback soon. I wish i could upvote but i got not enough reputation :) – ktsangop Sep 05 '13 at 13:28
0

After a lot of reading and experimentation i will have to answer my own question (if i am allowed to do of course).

Java just can't handle concurrent multiple socket connections without adding a big performance overhead. At least in a Core2Duo/4GB RAM/ Windows XP machine.

Creating multiple concurrent socket connections to remote hosts (using of course the code i posted) creates some kind of resource bottleneck, or blocking situation, wich i am still not aware of.

If you try to connect to 20 hosts simultaneously, and a lot of them are disconnected, then you cannot guarantee a "fast" connection to the alive ones. You will get connected but could be after 20-25 seconds. Meaning that you'll have to set socket timeout to something like 60 seconds. (not acceptable for my application)

If an alive host is lucky to start its connection try first (having in mind that concurrency is not absolute. the for loop still has sequentiality), then he will probably get connected very fast and get a response.

If it is unlucky, the socket.connect() method will block for some time, depending on how many are the hosts before it that will timeout eventually.

After adding a small sleep between the pool.submit(worker) method calls (100 ms) i realised that it makes some difference. I get to connect faster to the "unlucky" hosts. But still if the list of dead hosts is increased, the results are almost the same.

If i edit my host list and place a previously "unlucky" host at the top (before dead hosts), all problems dissapear...

So, for some reason the socket.connect() method creates a form of bottleneck when the hosts to connect to are many, and not alive. Be it a JVM problem, a OS limitation or bad coding from my side, i have no clue...

I will try a different coding approach and hopefully tommorow i will post some feedback.

p.s. This answer made me think of my problem : https://stackoverflow.com/a/4351360/2025271

Community
  • 1
  • 1
ktsangop
  • 1,013
  • 2
  • 16
  • 29