Q : "Is this naive solution possible...?"
Yes.

You might also like this piece of reading on this.
Q : "...should the images be encoded using H264/5?"
That Hell depends.
Given the said 20 Hz BMP-image ingress-rate, there are about 50 [ms]
per image for all the Visual-part of the (principally distributed) MVC-system.
Within those said 50 ms, there ought be zero time wasted & nothing might ever block.
So the receiving-engine must keep steady data-flow of the ingress, no traffic overloads by any other, un-coordinated bandwidth ( memory, I/O, ... ) eater ( BMP-images' size was not mentioned so far ) and must provide some means, what will get fed into the presenter-engine in cases the "next"-data due to get shown is not complete or present at all.
So what about the compression?
Compression is a double-sided sword - you obviously reduce the data-volume (with some SER/DES-codecs even at a cost of loosing some part of the original data-richness, yes, exactly - knowingly lossy compression schemes ), while typically adding some additional data-re-framing and, perhaps, R/S or other "line-code" error-detection/error-correction, so the final volume of data-to-xmit need not be as small as the pure compression-part itself allows in theory.
Result?
All that comes at remarkable costs - both on SER/coder-side, here to get as little data into the (knowingly low-bandwidth / fuzzy as most often un-manageable latency ) transport, and on the decoder/DES-side.
So, given the 20 Hz refresh rate leaves not more than a total 50 ms for one frame-repaint, the lump sum of the receiver-engine processing and presenter-engine processing cannot spend more than those 50 ms per frame. Any decode-related and DESerialiser-related processing is a deciding factor on this.
Yet, one may succeed, if proper design & flawless engineering took place for doing this right & robust enough.
Check your target device for all of:
- transport resources limits
(i.e. how much time get burnt & what resources get allocated / locked per arrival),
- memory-I/O
(latency and memory-I/O concurrency limits for any interleaved data-flow patterns),
- cache-hierarchy
(if present on a device) sizes, costs and I/O-limits),
- processing limits
(if multicore, the more if NUMA, beware of non-uniform memory-I/O traps)
- presenter-engine hardware bottlenecks
(memory-I/O, display device buffer-I/O limits and any other add-on latencies)
since any of these details may de-rail your smooth flow of (error-resilient) data to get finally presented on a target device in a due time for the wished to get target 20 FPS.
Good luck!
Nota bene:
if you may harness data-reduction right on the source, grab that chances & do it, like in any cases like where your know a priori that all target presenter-engines are B/W, never "send" colourful BMPs, strip off all the per-frame colour-table and high-level colour-profile tricks and stream not a bit more than just the raw, right-sized raster data, that match the worst-case processing & latency ceiling's scenario for your target terminal device(s).
Review carefully all these redundant & principally wasted (as repeating) parts of the generic BMP-data-format definition and do not re-broadcast 'em
;)