1

I need to encode some data to JSON and then push is to the syslog using hsyslog. The types of the two relevant functions are:

Aeson.encode :: a -> Data.ByteString.Lazy.ByteString

System.Posix.Syslog.syslog :: Maybe Facility
                           -> Priority
                           -> CStringLen
                           -> IO () 

What's the most efficient way (speed & memory) to convert a Lazy.ByteString -> CStringLen? I found Data.ByteString.Unsafe, but it works only with ByteString, not Lazy.ByteString?

Shall I just stick a unsafeUseAsCStringLen . Data.String.Conv.toS and call it a day? Will it to the right thing wrt efficiency?

Saurabh Nanda
  • 6,373
  • 5
  • 31
  • 60

1 Answers1

1

I guess I would use Data.ByteString.Lazy.toStrict in place of toS, to avoid the additional package dependency.

Anyway, you won't find anything more efficient than:

unsafeUseAsCStringLen (toStrict lbs) $ \cstrlen -> ...

In general, toStrict is an "expensive" operation, because a lazy ByteString will generally be made up of a bunch of "chunks" each consisting of a strict ByteString and not necessarily yet loaded into memory. The toStrict function must force all the strict ByteString chunks into memory and ensure that they are copied into a single, contiguous block as required for a strict ByteString before the no-copy unsafeUseAsCStringLen is applied.

However, toStrict handles a lazy ByteString that consists of a single chunk optimally without any copying.

In practice, aeson uses an efficient Data.ByteString.Builder to create the JSON, and if the JSON is reasonably small (less than 4k, I think), it will build a single-chunk lazy ByteString. In this case, toStrict is zero-copy, and unsafeUseAsCStringLen is zero copy, and the entire operation is basically free.

But note that, in your application, where you are passing the string to the syslogger, fretting about the efficiency of this operation is crazy. My guess would be that you'd need thousands of copy operations to even make a dent in the performance of the overall action.

K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71
  • "and not necessarily yet loaded into memory. " -- why do you say that? Does this mean that lazy ByteStrings are doing lazy IO under the hood? If I call `BSL.readFile`, it doesn't load all the contents into memory? What will happen if I call `BSL.readFile`, delete the file, and then try to access the entire lazy bytestring that was read? – Saurabh Nanda Mar 29 '20 at 15:37
  • The last two paragraphs of your answer address my original question. Thank you for the insightful reploy. If you have time, please see if you can explain https://stackoverflow.com/questions/60236427/most-efficient-way-of-converting-a-data-bytestring-lazy-to-a-cstringlen#comment107773467_60240781 – Saurabh Nanda Mar 29 '20 at 15:38
  • 1
    Yes, `BSL.readFile` does lazy I/O under the hood. It *could* do strict I/O in principle, but then there wouldn't be much point in returning a lazy BS. Nonetheless if you use `BSL.readFile`, then delete the file and try to read the contents, it'll still work fine, but that's because the file handle is held open, and data can still be read from an open handle on a deleted file. (Well, with exceptions for certain networked filesystems.) However, if you destructively modify the file (overwrite or truncate it), then those changes would, in general, be reflected in the lazily read value. – K. A. Buhr Mar 29 '20 at 17:30