2

Oh the joyous question of HTTP vs WebSockets is at it again, however even after quit a bit of reading on the hundreds of versus blog posts, SO questions, etc, etc.. I'm still at a complete loss as to what I should be working towards for our application. In this post I will be supplying information on application functionality, and the types of requests/responses used in our application currently.

Currently our application is a sloppy piece of work, thrown together using AngularJS and AJAX requests to a Apache server running PHP, namely XAMPP. With the launch of our application I've noticed that we're having problems with response times when the server is under any kind of load. This probably has something to do with the sloppy architecture of our server, the hardware, and the fact that our MySQL database isn't exactly optimized.

However, with such a loyal fanbase and investors seeing potential in our application and giving us a chance to roll out a 2.0 I've been studying hard into how to turn this application into a powerhouse of low latency scalability. Honestly the best option would be hire someone with experience, but unfortunately I'm a hobbyist, and a one-man-army without much experience.

After some extensive research, I've decided on writing the backend using NodeJS this time. However I'm having a hard time deciding on HTTP or Websockets. Here's the types of transactions that are done between the Server/Client.

  • Client sends a request to the server in JSON format. The request has a few different things.

    • A request id (For processing logic based on the request)
    • The data associated with the request ID.
  • The server receives the request, polls the database (if necessary) and then responds to the client in JSON format. Sometimes the server is serving files to the client. Namely images in Base64 format.

Currently the application (When being used) sends a request to the server every time an interface is changed, which on average for our application is once every few seconds. Every action on our interfaces sends another request to the server. The application also sends requests to check for notifications/messages every 8 seconds, (or two seconds depending on if they're on the messaging interface).

Currently here are the benefits I see of a stated connection over a stateless connection with our application.

  • If the connection is stated, I can eliminate the requests for notifications and messages, as the server can just tell the client whenever one comes available. This can eliminate x(n)/4 requests per second to the server alone.

  • Handling something like a disconnection from the server is as simple as attempting to reconnect, opposed to handling timeouts/errors per request, this would only be handled on the socket.

  • Additional security can be obtained by removing security keys for database interaction, this should prevent the possibility of Hijacking(?) of a session_key and using it to manipulate or access another users data. The session_key is only needed due to there being no state in the AJAX setup.

However, I'm someone who started learning programming through TCP game server emulation. So I understand some benefits of a STATED connection, while I don't understand the benefits of a STATELESS connection very much at all. I know they both have their benefits and quirks, but I'm curious what would be the best approach for us.

We're mainly looking for Scalability, as we had a local application launch and managed to bottleneck at nearly 10,000 users in under 48 hours. Luckily I announced this as a BETA and the users are cutting me a lot of slack after learning that I did it all on my own as a learning project. I've disabled registrations while looking into improving the application's front and backend.

IMPORTANT:

If using WebSockets, would we be able to asynchronously download pictures from the server like we can with AJAX? For example, I can make 5 requests to the server using AJAX for 5 different images, and they will all start downloading immediately, using a stated connection would I have to wait for each photo to be streamed before moving to the next request? Would this only bottle-neck a single user, or every user that is waiting on a request to be completed?

Hobbyist
  • 15,888
  • 9
  • 46
  • 98
  • Sorry to make your life harder, but I would reconsider `Node.js`... `Node.js` is a single threaded reactor based framework, which is amazing and wonderful UNLESS you have long processing jobs - such as unoptimized database queries which cause your server (instead of the database) to organize or process the data. If you have ANY requests that take a while to process (and your post indicates that you do), `Node.js` might not be the optimal choice and might cause your server to hang each time a client requests something that takes long to process... (either that or you pay more for scaling). – Myst Oct 14 '15 at 14:52
  • @Myst - I'm well aware of the risks that come with NodeJS, the queries can be optimized with ease as can the database structure. It's all going to be redone during this transition, as I'll be moving from MySQL to MongoDB. Also, it's fair to note that Node can make use off all processors using `Sticky-session`. I have my own machine already built that is sitting at a server-hosting stations, although technically a high-spec desktop, I was lucky enough to know people to get a 10Gbit connection and management for my 'desktop-server'. 1TB SSD, 32GB Ram, Debian, Quad-Core 4.6GHz processor. – Hobbyist Oct 14 '15 at 15:30
  • Please note, that half the performance issues in the original version of the application were because I was being completely lazy with the development. I was doing obsessive memory copying, storying raw-file data inside the database instead of linking to the filesystem, etc. Obviously it's going to stutter. Heh. – Hobbyist Oct 14 '15 at 15:31
  • Cool! Sounds like you thought this through. It's just something people often forget when moving to node, and they end up using long blocking calls instead of short sections with chained callbacks... than they wonder why their apps aren't as responsive as they could be. – Myst Oct 14 '15 at 15:35
  • @Myst - Yeah, this project was one of those "It's not like you could make a social network" -> "Yes I can" -> "Prove it" type of scenarios, that actually took off when It was honestly just some trash to prove a point to a couple of friends. Nothing in the original project was thought through. Just mashed together code that cobbled together what was expected to be a 4-5 user demonstration. Turned out to be a hit before I knew what happened. – Hobbyist Oct 14 '15 at 16:12
  • Nice :-) I would probably write another answer about how Websockets are better for larger scale applications, but I'm guessing you already read some of my answers on the subject. If you're using the file system and leveraging static files served directly by a load balancer or heavyweight a server, I would keep HTTP for file serving. But I would still avoid AJAX (almost at all costs) and have the Websockets layer tell the client when to download an updated file. Remember that on larger scales, AJAX behaves exactly the same as a DoS attack. – Myst Oct 14 '15 at 16:43
  • @Myst - Lucky for me I'm not serving files unless the client requests them. They're also fairly small (512x512 JPEG format), being sent through Base64 encoding. I was thinking about this though, having a file-server that was separate from the web sockets. Still concerned as to how data-transfer over web sockets is going to be with NodeJS if I start transferring files. Refer to my last paragraph in the question. – Hobbyist Oct 14 '15 at 16:56
  • Websockets are bi-directional, but unless you write your own sub-protocol (which isn't super difficult), all data is linear, meaning you will wait for one download to finish before another begins (much like HTTP pipelining vs. HTTP/2 multiplexing). Hence, for file transfers you might prefer to use the HTTP layer (which allows for multiple connections as well as caching by intermediaries such as proxies and routers)... But since the files are small and since HTTP suffers from similar issues, it might not matter all that much. – Myst Oct 14 '15 at 17:11
  • One thing to note is that WebSockets won't necessarily cross proxy servers. Squid at least had an issue where it just wouldn't handle WebSockets at all. – Joel C Oct 14 '15 at 19:17
  • 1
    This was a known issue when the Websockets protocol was first devised. To resolve this, use `wss` instead of `ws`. Having a secure TLS connection would cause traffic to "pass-through" proxies and routers that lack Websocket support... This is a very effective and common solution, since intermediaries (proxies/routers) shouldn't be able to read and "correct" (nor cache) the stream (putting aside "man-in-the-middle" attacks and issues of the sort). – Myst Oct 14 '15 at 19:27
  • @Myst Well, I wanted to continue this chat in a discussion, but apparently thats not acceptable. (As SO removed the offer). What I'm thinking of doing for the images is hosting a small HTTP server on the node stack that is ONLY used for serving the images. The images can be requested through AJAX calls. The images are going to have to get from the server to the users one way or another, and handling is asynchronously (and not blocking the server flow) is going to be the best way, I think. Yeah, tremendous amounts of AJAX requests are similar to a DoS attack, but if someone is going to... – Hobbyist Oct 14 '15 at 21:39
  • ... do it intentionally, they're going to do it regardless. If the server can't process that amount of natural requests, there's another issue to be resolved. – Hobbyist Oct 14 '15 at 21:39
  • I think you have it pretty much covered. As long as you don't use timers (short/long polling) and initiate AJAX requests only in response to Websocket events (or during initial page load), you should be golden... An Afterthought: if your server supports direct static file serving, you can bypass the application (node.js stack) to answer the AJAX requests and use timestamps instead (usually added to the request as an inline query parameter (i.e. `file.data?time=2015...`). – Myst Oct 15 '15 at 06:44
  • @Myst - That's interesting. I've never really heard anything about this method, but I'm just a hobbyist without any educational background too ;). I'm not sure what you mean by 'static file serving' but I'm sure I can set it up, considering I own the machine. What I understand though is that the web socket server would send a timestamp, the client would respond to the web socket and send an ajax request using the timestamp (instead of an image id). Once the timestamp is used it's no longer valid, thus eliminating necessary data transfer. Am I in the ballbark? – Hobbyist Oct 15 '15 at 17:28
  • Not exactly.... After setting up Apache (or Nginx or whatever) to send static files independently of the node.js stack: 1. Your Websocket server pushes the image's url and it's timestamp (when it was last updated). 2. The client receives the data and requests the image using AJAX with the url AND the timestamp (**always** with the timestamp). 3. The HTTP proxy/router/server cannot use the cached image because of the new timestamp and creates **a new cache** for the new timestamp... ...this is why you use the same timestamp always - leveraging caching. – Myst Oct 15 '15 at 20:25
  • P.S. It's old technique. It was more common when Rails was still 2.0 and HTTP servers didn't handle caching headers as well as they do today... but it should still work. – Myst Oct 15 '15 at 20:26
  • @Myst - the way that I have it set up right now is like so: Whenever the user needs to download an image the WebSocket server creates a `uuid` for the image, and stores it into a collection `MongoDB`. This "document" contains the image storage on the filesystem, the uuid of the image, and the time of creation. I have the server poll the collection every 30 minutes and issue a delete request for old requests that were never used (for whatever reason). A request is also deleted when it is used. I'm serving the images through NodeJS's http server running in parallel with the web socket server. – Hobbyist Oct 15 '15 at 22:18
  • But I'm using sticky-sessions so I'll bind the HTTP server to it's own core. Images would be downloaded like `https://domain.com/rest/uuid` Opinions? – Hobbyist Oct 15 '15 at 22:18
  • This sounds a bit resource heavy to me, especially if you have access to a persistent file system. You might consider the fact that between you and the client are many intermediaries and helpers (Browsers/ISP/Routers/Proxies), all wanting to help you by using advanced caching algorithms... Your solution avoids persistent URLs for identical resources, thereby refusing any caching and help offered by all these machines. Why not simply save and overwrite the image on the hard disk and ask all theses machines to refresh their cache by providing an updated URL (that's the timestamp concept)? – Myst Oct 15 '15 at 23:05
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/92438/discussion-between-hobbyist-and-myst). – Hobbyist Oct 15 '15 at 23:10
  • P.S. It's really up to you... But as a rule of thumb, simpler solutions usually result in better performance. Polling the database and keeping track of UUIDs and using timeouts, that's generally more complex than allowing direct 'read' access using a persistent URL (whether or not the data is stored in a DB requiring authentication or as a file). – Myst Oct 15 '15 at 23:11

2 Answers2

2

It all boils down on how your application works and how it needs to scale. I would use bare WebSockets rather than any wrapper, since it is an already easy to use API and your hands won't be tied when you need to scale out.

Here some links that will give you insight, although not concrete answers to your questions because as I said, it depends on your expectations.

Hard downsides of long polling?

WebSocket/REST: Client connections?

Websockets, and identifying unique peers[PHP]

How HTML5 Web Sockets Interact With Proxy Servers

Community
  • 1
  • 1
vtortola
  • 34,709
  • 29
  • 161
  • 263
2

If your question is Should I use HTTP over Websockets ?, the response is: You should not.

Even if it is faster because you don't lose time opening the connection, you lose also all the HTTP specification like verbs (GET, POST, PATCH, PUT, ...), path, body, and also response, status code. This seams simple but you'll have to re-implement all or part of these protocol things.

So you should use Ajax, as long as it is one ponctual request.

When you need to make an ajax request every 2 seconds, you need in fact that the server sends you data, not YOU request server to check Api change (if changed). So this is a sign that you should implement a websocket server.

Alcalyn
  • 1,531
  • 14
  • 25