I implemented something akin to this several years ago (using classic AJAX rather than websockets etc).
Rendering the image will be CPU expensive on the sending client. It'll also generate an inflexible result (but possibly that is what you want - a "most precise" representation of what the client is rendering?) with a relatively high data size (every pixel has to be represented explicitly).
That'll introduce latency (with the rendering and transfer time) and will potentially bottleneck on bandwidth (having to skip frames to "keep up").
All that said, in a "lab environment" (where you control all factors like bandwidth etc) it may well work fine. I'd be interested to see your findings...
The way I implemented it was by sending the DOM, and then rendering it on the receiver's client (you could do it as a canvas, I just had the browser render it as a webpage document. Just be sure whatever you do that you're not opening yourself to injection vulnerabilities...).
CSS etc was pulled once at the start. Every "frame" of was fairly compressed (minified) XML markup.
Some thoughts for if you go with your existing plan:
Make sure you don't try to send a frame until the previous one has been acknowledged. The frames-per-second should be dynamic, as the hardest bottleneck allows.
Consider compression of the rendered image data. JPEG is typically good at lossy compression while maintaining enough detail where it matters... (at least for a human eye). For example, see: setting canvas toDataURL jpg quality
For a really-optimised experience, ignore areas of the screen that haven't changed. I believe (from stretching my memory) that video codecs often employ techniques like this. On the sending client, track both the previous and next rendered frames and compare chunks of them (lets say 128x128 pixels) and only send those chunks which are actually different. At best, you only need to send a "No changes" message (indicating that the current frame is identical to the previous), at worse you need to send all 128x128 sections.
- Consider how you store them on the server. It only needs to be a very temporary FIFO buffer right? Don't go overkill with SQL databases etc... Find a solution which efficiently solves the problem.
- To reduce the load on the receiving client, you might consider rendering a video stream (HTML5 compatible format) on the server based on the individual frames and stream that down to the client.