-3

I am seeing lot of 503 at varnish end and one hypothesis is that it is running out of tcp connections. I did lot of googling(May be my googling skills are really poor) but did not find how to check current allowed tcp connections per process and current TIME_WAIT value? Here is the output of netstat:

netstat -an | wc -l

690

I am really new to network troubleshooting so this could be really naive question. So really sorry for that.

Edit: As lot of comments are suggesting it can not possibly happen so I am adding more information here.

  1. I already checked tomcat access log at backend I don't see any 503.
  2. The varnish total time taken is also low(around 1 ms) for 503 but generally any backend request takes around 30-40 ms.
  3. This happens when we see really high traffic.

Please comment if anyone needs more information.

kamalkishor1991
  • 876
  • 9
  • 14
  • Have you tried [basic troubleshooting for those 503 errors](https://www.getpagespeed.com/troubleshooting/varnish-backend-fetch-failed)? What is the actual error as per ```varnishlog``` when it gives a 503 error? – Danila Vershinin Oct 03 '17 at 08:15
  • @DanielV. yes I have looked into these. The backend is healthy. Unfortunately we have custom log format that does not log enough information to troubleshoot 503's but based on these logs I can clearly see that the problem is not at the backend. – kamalkishor1991 Oct 03 '17 at 08:22
  • Nonsense. The backend issued the 503. Whatever the problem was, it *came from* the backend. – user207421 Oct 03 '17 at 09:15
  • That's what is surprising, no 503 are returned from backend. – kamalkishor1991 Oct 03 '17 at 09:18
  • @kamalkishor1991 More nonsense. Nonsense on stilts. You **cannot possibly** be 'seeing a lot of HTTP 503' **unless** it came from the backend. – user207421 Oct 03 '17 at 09:22
  • I don't understand this, if varnish machine is out of resources why can't it throw 503s. Because clearly that's what I am seeing here. – kamalkishor1991 Oct 03 '17 at 09:45
  • You are seeing 503 for a reason which you have not yet identified. You can't possibly have received 503 over a connection that couldn't be established. You are barking up the wrong tree entirely. You are just going to have to examine your server logs and find out the real reason. Instead of just guessing. – user207421 Oct 03 '17 at 10:01
  • Re your edit, nobody has suggested 'it cannot possibly happen'. It *is* happening. What we are telling you is that it cannot possibly happen *because of TIME_WAIT.* Please read what you're told here. – user207421 Oct 03 '17 at 10:04
  • Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See [What topics can I ask about here](http://stackoverflow.com/help/on-topic) in the Help Center. Perhaps [Super User](http://superuser.com/) or [Unix & Linux Stack Exchange](http://unix.stackexchange.com/) would be a better place to ask. – jww Oct 03 '17 at 14:26

2 Answers2

1

I am seeing lot of 503 at varnish end and one hypothesis is that it is running out of tcp connections.

Rubbish. That would only affect the client, in which case there would be no connection, no HTTP, and no 503.

I did lot of googling(May be my googling skills are really poor) but did not find how to check current allowed tcp connections per process and current TIME_WAIT value?

Nothing to do with it. TIME_WAIT is normal. Preferably it occurs at the client, where it can't hurt, if both server and client are using HTTP 1.1 and the client is doing connection pooling. It doesn't have anything to do with HTTP 503.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • The hypothesis is that varnish is running out of connections while connecting to backend(Each TCP connection will occupy 1 port) because all 503 are cache misses. It is not a problem with client. If TIME_WAIT is high the system will hold the port for a long time. – kamalkishor1991 Oct 03 '17 at 09:17
  • 1
    @kamalkishor1991 There is nothing but misinformation in your comment. If you received HTTP 503, you received it via HTTP, which runs over TCP, so you had a TCP connection, so you didn't run out of TCP connections. Your hypothesis doesn't begin to make sense. Each TCP connection to the backend does not occupy one port. I didn't say it was a problem with the client: in fact it is completely the opposite. . – user207421 Oct 03 '17 at 09:35
  • Each HTTP(tcp) will occupy one port when it connect to backend that's how you keep track of different backend connections. Let's say system allows only n ports to varnish and varnish is trying to make more connections then n to backend. It will fail to make (n + 1)th connection and return 503 for that request. – kamalkishor1991 Oct 03 '17 at 09:41
  • 1
    @kamalkishor1991 None of that can possibly explain how you magically received the characters `503` over a connection that couldn't be established. The problem is in the HTTP server, and certainly *not*, by *definition*, at the TCP level, and therefore nothing to do with TIME_WAIT either. – user207421 Oct 03 '17 at 09:43
  • I suppose varnish will construct 503 for you. That's what happens when your backend machine is completely down, right? – kamalkishor1991 Oct 03 '17 at 09:48
  • 1
    How? How can you receive 503 over a connection that wasn't established? You haven't addressed this fundamental issue, despite it being stated five or six times. You're barking up the wrong tree. NB 503 is *sent*, over a *HTTP* and *TCP connection,* not 'thrown'. – user207421 Oct 03 '17 at 09:49
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/155815/discussion-between-kamalkishor1991-and-ejp). – kamalkishor1991 Oct 03 '17 at 09:50
  • No thanks. I don't do chat, and it is already evident that I am merely wasting my valuable time. – user207421 Oct 03 '17 at 09:50
-2

Firstly a socket in the state of TIME_WAIT does not consume application file descriptors(The TCP TIME_WAIT state is a natural and perfectly normal TCP/IP state ). After the close has completed the file descriptor is release for reuse. TIME_WAIT connection do consume kernel resources but not to a noticeable degree. There is a small amount of kernel memory used to keep track of the sockets and states but by modern operating system memory is it tiny. The most notable impact is that the TCP port number range is finite. to decent size ephemral port range set net.ipv4.ip_local_port_range to a correct value.

mohsen.b
  • 436
  • 2
  • 8