Why are WebSockets masked?

Question

I was following a guide provided by MDN on Writing a WebSocket server, the guide is pretty straightforward and easy to understand...

However upon following this tutorial I ran across the frame that WebSocket messages from the client are sent in:


0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |
|N|V|V|V|       |S|             |   (if payload len==126/127)   |
| |1|2|3|       |K|             |                               |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
|     Extended payload length continued, if payload len == 127  |
+ - - - - - - - - - - - - - - - +-------------------------------+
|                               |Masking-key, if MASK set to 1  |
+-------------------------------+-------------------------------+
| Masking-key (continued)       |          Payload Data         |
+-------------------------------- - - - - - - - - - - - - - - - +
:                     Payload Data continued ...                :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
|                     Payload Data continued ...                |
+---------------------------------------------------------------+

After making some functions to properly unmask the data and the frame that are sent by client, it made me wonder why the data is even masked to begin with. I mean, you don't have to mask data you're sending from the server...

If someone were getting the data for bad reasons, it could be relatively easy to unmask it because the masking key is included with the whole message. Or even provided they didn't have the key, the masking-key in the frame is only 2 bytes long. Someone could easily unmask the data since the key is very very small.

Another reason I'm wondering why the data is masked is because you can simply protect your WebSocket data better than the masking by using WSS (WebSockets Secure) on TLS/SSL, and over HTTPS.

Am I missing the point of why WebSockets are masked? Seems like it just adds pointless struggle to unmask the data sent by the client when it doesn't add any security to begin with.

Partial answer here: [What is the mask in a webSocket frame](http://stackoverflow.com/questions/14174184/what-is-the-mask-in-a-websocket-frame) and [Is masking really necessary when sending from webSocket client](http://programmers.stackexchange.com/questions/226343/is-masking-really-necessary-when-sending-from-websocket-client) and [How does websocket framing protect against cache poisoning](http://security.stackexchange.com/questions/36930/how-does-websocket-frame-masking-protect-against-cache-poisoning). — jfriend00, Oct 21 '15 at 03:16
@jfriend00, It does provide some insight as to why the standard defines it. But with my arguments I still don't understand why it's required by the client. — , Oct 21 '15 at 03:19
Also [WebSockets - Why do we need to mask data from client to server?](http://www.lenholgate.com/blog/2011/07/websockets---why-do-we-need-to-mask-data-from-client-to-server.html) — jfriend00, Oct 21 '15 at 03:20
Now have four reference articles in my two previous comments. Do you not see a rationale in any of them? — jfriend00, Oct 21 '15 at 03:21
In short: Masking is not to protect the data from being read. It is to protect servers (including proxy servers) from malicious use of WebSockets. — , Oct 21 '15 at 03:24
The answer from http://stackoverflow.com/questions/14174184/what-is-the-mask-in-a-websocket-frame does answer my question; however, it doesn't feel right marking this question as a duplicate because they're not the same questions. — , Oct 21 '15 at 03:30

Myst · Accepted Answer · 2016-09-16T21:48:43.673

14

jfriend00's comment has great links to good information...

I do want to point out to the somewhat obvious, so as to show that masking unencrypted websocket connections is a necessary requirement, rather than just beneficial:

Proxies, routers and other intermediaries (esp. ISPs) often read the requests sent by the a client and "correct" any issues, add headers and otherwise "optimize" (such as respond from cache) network resource consumption.

Some headers and request types (such as Connect) are often directed at these intermediaries rather than the endpoint server.

Since many of these devices are older and unaware of the Websockets protocol, clear text that looks like an HTTP request might be edited or acted upon.

Hence, it was necessary that clear text would be "shifted" to unrecognized bytes, to initiate a "pass through" rather than "processing".

After this point, it was just about leveraging the masking to make sure hackers didn't "reverse" this masking to send malicious frames.

As for requiring wss instead of masking - I know this was considered during the writing of the standard... but until certificates are free, this would make any web standard requiring SSL/TLS a "rich man's" standard rather than an internet wide solution.

As for "why mask wss data?" - I'm not sure about this one, but I suspect that it is meant to allow the parser to be connection agnostic and easier to write. In clear text, unmasked frames are a protocol error and result in a disconnection initiated by the server. Having the parser behave the same, regardless of the connection, allows us to separate the parser from the raw IO layer, making it connection agnostic and offering support for event based programming.

edited Sep 16 '16 at 21:48

answered Oct 21 '15 at 15:12

Myst

18,516
2
45
67

Proxies needed to change to understand the Upgrade protocol anyway, it would be trivial for proxies to also add not processing the stream further, where as it is hard for servers to implement masking and its on going cost forever since debugging is a pain. Clients *all* need to be upgraded to support this complicated protocol. I think the real answer is that Google have a competitive advantage due to scale if web technologies are harder to implement in the server. Its not a *requirement* at all. – teknopaul Nov 12 '17 at 21:15
@teknopaul , I understand your sentiment. However, I'm pretty certain that it isn't correct that "Proxies needed to change to understand the Upgrade protocol anyway".... Many Proxies had a pass-though fallback builtin and could work "as is" as long as the data was "mangled" (wasn't recognized as HTTP headers). Older proxies had an issue because with the `Connect` header (they failed to forward the header), but that's not all the proxies, To the best of my knowledge, many proxies weren't updated to this day and many of them (but not all) work fine with Websocket Upgrades. – Myst Nov 13 '17 at 02:52
@teknopaul - P.S., unmaking might be an annoying resource concern due to memory cache misses, but it's a simple 4 byte XOR operation in a loop... it's really quite easy to implement and in most languages doesn't require the data to be copied (I wrote a parser for Ruby, which caused String copies, but my C parser was easier to author and no copying is involved). – Myst Nov 13 '17 at 02:55
I'm writing a streaming C parser, I don't get the data in chunks that are divisible by 4. I'm XORing byte by byte and recording position in the 4byte mask, with some more code I could optimize to XOR in the word size of the machine the code is running. All pointless effort. WebSockets could simply have been a single header Upgrade:websockets Nothing else was a requirement AFAICS. Proxies have to change, no biggie, everyone's life is easier, sensible folk can use text based protocols, as the web intended. Google can use binary protocols and XORing if they so wish to bypass unpatched proxies. – teknopaul Nov 16 '17 at 14:48
XORing does not guarantee anything about the data you send. As an example. You can have 0x0 0x0 0x0 0x0 as the mask and the data is sent as is, so its incorrect to suggest that without XORing there is some problem and with it there is a solution to this problem. The masking is an unnecessary annoyance, and if it isn't, its broken because of the possibility of 0000 masks. – teknopaul Nov 16 '17 at 14:53
@teknopaul, 2 things: 1. if you're looking for an XORing logic [you can copy the one from the C parser I authored](https://github.com/boazsegev/facil.io/blob/78fe92d1156c9ac5dfe3324f8ea46f6922cd389b/lib/facil/http/websocket_parser.h#L144-L200), it assume a 64 bit architecture and protects against unaligned memory access... or you can just use the whole parser, it's under the MIT license; 2. `0000` are not really valid masks... as the data is left "unmasked"... not that the 2^32 risk factor matters once fall through was initialized on the first frame. – Myst Nov 16 '17 at 18:22
I'm not looking for code. I'm looking to answer "why WebSockets are masked?" My answer is that there is no good reason. – teknopaul Nov 25 '17 at 21:47
@teknopaul, I'm not here to fight, I'm here to help. – Myst Nov 25 '17 at 22:47

score 4 · Answer 2 · answered Jul 30 '18 at 09:19

Actually the definitive RFC, RFC 6455 The WebSocket Protocol, has an explanation. I quote it here:

 10.3.  Attacks On Infrastructure (Masking)

   In addition to endpoints being the target of attacks via WebSockets,
   other parts of web infrastructure, such as proxies, may be the
   subject of an attack.

   As this protocol was being developed, an experiment was conducted to
   demonstrate a class of attacks on proxies that led to the poisoning
   of caching proxies deployed in the wild [TALKING].  The general form
   of the attack was to establish a connection to a server under the
   "attacker's" control, perform an UPGRADE on the HTTP connection
   similar to what the WebSocket Protocol does to establish a
   connection, and subsequently send data over that UPGRADEd connection
   that looked like a GET request for a specific known resource (which
   in an attack would likely be something like a widely deployed script
   for tracking hits or a resource on an ad-serving network).  The
   remote server would respond with something that looked like a
   response to the fake GET request, and this response would be cached
   by a nonzero percentage of deployed intermediaries, thus poisoning
   the cache.  The net effect of this attack would be that if a user
   could be convinced to visit a website the attacker controlled, the
   attacker could potentially poison the cache for that user and other
   users behind the same cache and run malicious script on other
   origins, compromising the web security model.

   To avoid such attacks on deployed intermediaries, it is not
   sufficient to prefix application-supplied data with framing that is
   not compliant with HTTP, as it is not possible to exhaustively
   discover and test that each nonconformant intermediary does not skip
   such non-HTTP framing and act incorrectly on the frame payload.
   Thus, the defense adopted is to mask all data from the client to the
   server, so that the remote script (attacker) does not have control
   over how the data being sent appears on the wire and thus cannot
   construct a message that could be misinterpreted by an intermediary
   as an HTTP request.

   Clients MUST choose a new masking key for each frame, using an
   algorithm that cannot be predicted by end applications that provide
   data.  For example, each masking could be drawn from a
   cryptographically strong random number generator.  If the same key is
   used or a decipherable pattern exists for how the next key is chosen,
   the attacker can send a message that, when masked, could appear to be
   an HTTP request (by taking the message the attacker wishes to see on
   the wire and masking it with the next masking key to be used, the
   masking key will effectively unmask the data when the client applies
   it).

   It is also necessary that once the transmission of a frame from a
   client has begun, the payload (application-supplied data) of that
   frame must not be capable of being modified by the application.
   Otherwise, an attacker could send a long frame where the initial data
   was a known value (such as all zeros), compute the masking key being
   used upon receipt of the first part of the data, and then modify the
   data that is yet to be sent in the frame to appear as an HTTP request
   when masked.  (This is essentially the same problem described in the
   previous paragraph with using a known or predictable masking key.)
   If additional data is to be sent or data to be sent is somehow
   changed, that new or changed data must be sent in a new frame and
   thus with a new masking key.  In short, once transmission of a frame
   begins, the contents must not be modifiable by the remote script
   (application).

   The threat model being protected against is one in which the client
   sends data that appears to be an HTTP request.  As such, the channel
   that needs to be masked is the data from the client to the server.
   The data from the server to the client can be made to look like a
   response, but to accomplish this request, the client must also be
   able to forge a request.  As such, it was not deemed necessary to
   mask data in both directions (the data from the server to the client
   is not masked).

   Despite the protection provided by masking, non-compliant HTTP
   proxies will still be vulnerable to poisoning attacks of this type by
   clients and servers that do not apply masking.

Why are WebSockets masked?

2 Answers2

Linked