Efficient way to combine a lazy ByteString and a lazy Text

Question

I'm writing some code that is rendering an HTML page (via servant, if that's relevant), and for various complicated reasons, I have to construct the HTML by "combining" two segments.

One segment is fetched from an internal HTTP API which returns a Data.ByteString.Lazy
The other segment is rendered using the ede library, which generates a Data.Text.Lazy

What options do I have if I have to combine these two segments efficiently? The two segments can be reasonably large (few 100 kbs each). This servant server is going to see quite some traffic, so any inefficiency (like copying 100s of kbs of memory for every req/res, will quickly add up).

Anything nontrivial that happens during the generation of the segments should certainly outweigh the overhead of copying one of them to the format of the other. (And if only trivial things happen then it should be easy to adapt the code to give the preferred type right away.) But, if these segments are independent, why not just generate two HTML files and combine them client-side? — leftaroundabout, Nov 15 '22 at 15:24
Also... are you sure you need to have so much dynamically generated HTML code in the first place? Outsourcing the constant parts to a CDN and/or storing any data portions in an efficient binary format would improve performance much more than anything you can do on the side of the Haskell types that store the HTML. — leftaroundabout, Nov 15 '22 at 15:29
@leftaroundabout can't really combine these segments on the client side. The lazy ByteString segment is actually the layout (header + footer) of the page. And the lazy Text part is the body/main content of the page. They _must_ be combined on the server side before being served to the client. — Saurabh Nanda, Nov 15 '22 at 15:48
@leftaroundabout from a CPU standpoint I agree that probably copying of data will not be a bottleneck compared to, say, accessing the DB. But what about memory usage under high load/traffic? — Saurabh Nanda, Nov 15 '22 at 15:49
@leftaroundabout pre-generating the HTML and storing in Redis or CDN would be my next step, but I was curious about how to get this job done nevertheless. — Saurabh Nanda, Nov 15 '22 at 15:58
How would you combine them if efficiency *weren't* an issue? If all you need to do is concatenate them, I would think `append lazyByteString (put lazyText)` was sufficient. (The entire thing has to be realized to send to the client.) The issue then seems to be in separating the two halves of the header/footer, so `let (header, footer) = _ lazyByteString in append header (append (put lazyText) footer)`? — chepner, Nov 15 '22 at 17:29

danidiaz · Answer 1 · 2022-11-15T19:34:03.197

4

Assuming your endpoint returns a lazy ByteString, use the function encodeUtf8 from Data.Text.Lazy.Encoding to convert your lazy Text into a lazy ByteString, and then return the append of the two lazy ByteStrings.

Internally, lazy ByteStrings are basically lists of strict ByteString chunks. Concatenating them is list concatenation, and doesn't incur in new allocations for the bytes themselves.

A time and space-efficient implementation of lazy byte vectors using lists of packed Word8 arrays

Some operations, such as concat, append, reverse and cons, have better complexity than their Data.ByteString equivalents, due to optimisations resulting from the list spine structure.

If you had a large number of lazy ByteStrings instead of two, you should take the extra step of using lazyByteString to convert them to Builders, concatenate the Builders, and then get the result lazy ByteString using toLazyByteString. This will avoid the inefficiency of left-associated list concatenation.

Builders denote sequences of bytes. They are Monoids where mempty is the zero-length sequence and mappend is concatenation, which runs in O(1).

edited Nov 15 '22 at 19:34

answered Nov 15 '22 at 19:22

danidiaz

26,936
4
45
95

Related: https://stackoverflow.com/questions/51878473/philosophy-behind-http-simple-setrequestbodylbs/51880216#51880216 – danidiaz Nov 15 '22 at 19:35
1

How expensive is the `encodeUtf8` operation? – Saurabh Nanda Nov 16 '22 at 02:38
1

@SaurabhNanda irrelevant, I assume, because you will have to call it sooner or later! – user253751 Nov 16 '22 at 13:14
@SaurabhNanda Since version `2.0`, "text" uses UTF8 internally. So `encodeUtf8` should be fast. https://hackage.haskell.org/package/text-2.0.1/changelog – danidiaz Nov 16 '22 at 14:22

Efficient way to combine a lazy ByteString and a lazy Text

1 Answers1