Security aspect of websocket masking

Question

When implementing a websocket server I came across the need to for clients to mask data. I was googling why this was specified as no reason is given in the RFC and came across this answer (and others with the same reasoning): about websocket's mask field

It made sense at first so I just implemented it, but since theres hardly anything I hate more than forced "security" it still kinda gnawed on me and I ended up with two assumptions and I was wondering if those are true:

Since its only mandatory for the client, this assumes that malfunctioning proxies or intermediate components only care about the client request? Or that there are no malicious servers?
Its not actually to prevent malicious actors from causing remote code execution, it is to prevent a good client from accidentally becoming a bad actor by causing code execution without intention. If I am a bad actor I am just going to send unmasked data or set the flag, that the payload is masked but dont mask it.

Is there anything I am missing that would render these assumptions wrong?

score 1 · Accepted Answer · edited Oct 07 '21 at 11:06

The greater issue - which can be used as a vector for attacking intermediaries - is mostly about making sure the WebSockets protocol can pass through the existing infrastructure without being impeded and without causing unintended damage such as cache poisoning or other side-effects.

As I mentioned in my previous answer to a similar question, since many of the older intermediaries are unaware of the Websockets protocol, clear text that looks like an HTTP request might be edited or acted upon.

For example, a proxy might drop what it perceives as a malformed HTTP request (non-security issue that results in connection errors and data-loss), or a caching proxy might cache a response to what it perceives as a valid GET request (resulting in cache poisoning which is a security issue).

By masking the text that the client sends (the text that the intermediary might believe to be an HTTP request), the processing mode is "shifted" from clear-text to unrecognized bytes, initiating a "pass through" mode rather than allowing the intermediaries to "process" the data as if it was valid (or invalid) HTTP.

The assumption is that if there's a masking error (a 0000 mask / persistent mask), or the masked data appears malicious, the server will disconnect the WebSocket and any possible attack would fail.

Assuming the masking was properly handled (the data was masked), there's no risk that the response will be cached or otherwise get processed by the intermediaries - so there's no need to waste CPU cycles on masking the response.

Does this mean all attack scenarios involve at least one response from the server? The first unmasked request will be processed by the intermediaries before it goes to the server where it might get disconnected. — Yanick Salzmann, Oct 26 '20 at 02:09
Yes. All attack vectors require the server to respond using WebSocket data to a maliciously crafted WebSocket packet. However, the group of possible connection issues that masking solves is larger than the group of possible attack vectors. In general intermediaries are designed to handle malicious requests and they are not designed to handle "malicious" / corrupt responses. — Myst, Oct 26 '20 at 09:53
That does indeed sound much more reasonable than the remote code exectuion I've read a few times, thanks :) — Yanick Salzmann, Oct 26 '20 at 09:55

Security aspect of websocket masking

1 Answers1