Using GNU/Linux system call `splice` for zero-copy Socket to Socket data transfers in Haskell

Question

Update: Mr. Nemo's answer helped solve the problem! The code below contains the fix! See the nb False and nb True calls below.

There is also a new Haskell package called splice (, which has OS-specific and portable implementations of best known socket to socket data transfer loops).

I have the following (Haskell) code:

#ifdef LINUX_SPLICE
#include <fcntl.h>
{-# LANGUAGE CPP #-}
{-# LANGUAGE ForeignFunctionInterface #-}
#endif

module Network.Socket.Splice (
    Length
  , zeroCopy
  , splice
#ifdef LINUX_SPLICE
  , c_splice
#endif
  ) where

import Data.Word
import Foreign.Ptr

import Network.Socket
import Control.Monad
import Control.Exception
import System.Posix.Types
import System.Posix.IO

#ifdef LINUX_SPLICE
import Data.Int
import Data.Bits
import Unsafe.Coerce
import Foreign.C.Types
import Foreign.C.Error
import System.Posix.Internals
#else
import System.IO
import Foreign.Marshal.Alloc
#endif


zeroCopy :: Bool
zeroCopy =
#ifdef LINUX_SPLICE
  True
#else
  False
#endif


type Length =
#ifdef LINUX_SPLICE
  (#type size_t)
#else
  Int
#endif


-- | The 'splice' function pipes data from
--   one socket to another in a loop.
--   On Linux this happens in kernel space with
--   zero copying between kernel and user spaces.
--   On other operating systems, a portable
--   implementation utilizes a user space buffer
--   allocated with 'mallocBytes'; 'hGetBufSome'
--   and 'hPut' are then used to avoid repeated 
--   tiny allocations as would happen with 'recv'
--   'sendAll' calls from the 'bytestring' package.
splice :: Length -> Socket -> Socket -> IO ()
splice l (MkSocket x _ _ _ _) (MkSocket y _ _ _ _) = do

  let e  = error "splice ended"

#ifdef LINUX_SPLICE

  (r,w) <- createPipe
  print ('+',r,w)
  let s  = Fd x -- source
  let t  = Fd y -- target
  let c  = throwErrnoIfMinus1 "Network.Socket.Splice.splice"
  let u  = unsafeCoerce :: (#type ssize_t) -> (#type size_t)
  let fs = sPLICE_F_MOVE .|. sPLICE_F_MORE
  let nb v = do setNonBlockingFD x v
                setNonBlockingFD y v
  nb False
  finally
    (forever $ do 
       b <- c $ c_splice s nullPtr w nullPtr    l  fs
       if b > 0
         then   c_splice r nullPtr t nullPtr (u b) fs)
         else   e
    (do closeFd r
        closeFd w
        nb True
        print ('-',r,w))

#else

  -- ..    

#endif


#ifdef LINUX_SPLICE
-- SPLICE

-- fcntl.h
-- ssize_t splice(
--   int          fd_in,
--   loff_t*      off_in,
--   int          fd_out,
--   loff_t*      off_out,
--   size_t       len,
--   unsigned int flags
-- );

foreign import ccall "splice"
  c_splice
  :: Fd
  -> Ptr (#type loff_t)
  -> Fd
  -> Ptr (#type loff_t)
  -> (#type size_t)
  -> Word
  -> IO (#type ssize_t)

sPLICE_F_MOVE :: Word
sPLICE_F_MOVE = (#const "SPLICE_F_MOVE")

sPLICE_F_MORE :: Word
sPLICE_F_MORE = (#const "SPLICE_F_MORE")
#endif

Note: The code above now just works! Below is no longer valid thanks to Nemo!

I call splice as defined above with two open and connected sockets (which are already used to transmit minimal amount of handshake data using either the sockets API send and recv calls or converted to handles and used with hGetLine and hPut) and I keep getting:

Network.Socket.Splice.splice: resource exhausted (Resource temporarily unavailable)

at the first c_splice call site: c_splice returns -1 and sets some errno to a value (probably EAGAIN) that reads resource exhausted | resource temporarily unavailable when looked up.

I tested calling splice with different Length values: 1024, 8192.

Your current version creates a new pipe every time you call splice(). This is OK if you are always moving large blocks, but for small blocks that may impose a big overhead. I usually create a "Splicer" object to own the pipe, then call it repeatedly with from+to descriptors to move the data. — Nemo, Apr 10 '12 at 17:10
@Nemo `splice` (not `c_splice`) is actually an infinite loop due to `forever`. I guess I should rename `splice` to something like `loopSplice` to make that clear. So currently it creates one pipe per proxy connection not per each `c_splice` call. — Cetin Sert, Apr 10 '12 at 18:26
I have still a lot to test with the portable implementation on Windows though so I will definitely have enough time to spend thinking on a better name as well. Open for your suggestions as well :) — Cetin Sert, Apr 10 '12 at 18:30
Ah... Told you I do not actually know Haskell :-). (A search for "Haskell non-blocking socket" got me to the relevant source.) Oh for the days when I could code in a functional language... — Nemo, Apr 10 '12 at 22:08
@CetinSert thank you for the package, it will certainly be useful. For us not familiar with this kernel feature, could you explain how to do an HTTP 1.1 reverse proxy with it ? This example requires to read and interpret headers of varying length in userspace, then splice socket to socket for a limited amount of bytes, then get back control in userspace for next request. Is that possible ? — Paul R, Apr 16 '12 at 21:44
@PaulR The Linux system call [`c_splice`](http://hackage.haskell.org/packages/archive/splice/0.5.1/doc/html/System-IO-Splice-Linux.html) does not affect socket states in any way and sockets can be used before and after using `c_splice` on their file descriptors. I do a tiny bit of conventional initial send / receive on sockets before entering the [`splice`](http://hackage.haskell.org/packages/archive/splice/0.5.1/doc/html/Network-Socket-Splice.html) `flow >-<` loop in [this example](http://beta.corsis.eu/technology/). — Cetin Sert, Apr 17 '12 at 11:19
@PaulR `System.IO.Splice.Linux` handles pre- and post-fix additions. The `Network.Socket.Splice` API on the other hand only exports terminal operations on sockets; I plan extensions to allow for message reshaping as well as loop control. As I want all hackage packages I create to be truly general purpose and not tied to my own implementation needs in a certain application (achieved with `System.IO.Splice.Linux`, not fully with `Network.Socket.Splice`), I am willing to hear all suggestions that can help achieve APIs that can accommodate almost every need. — Cetin Sert, Apr 17 '12 at 11:30
@PaulR https://github.com/corsis/splice/issues might be a good place to post any issues you may encounter and like to see addressed. — Cetin Sert, Apr 17 '12 at 11:34
@CetinSert ah ! I did not look at the right module indeed. splice_c has the right API to interleave userspace logic with kernelspace byte-copying. Thank you for this package, I hope I will have time to play with it at some point. — Paul R, Apr 17 '12 at 13:02
@Nemo I have just run into the big overhead issue you mentioned a long time ago :) Still the performance of splice with long streams are really good: http://stackoverflow.com/questions/12230316/do-other-operating-systems-implement-the-linux-system-call-splice. I will do some sort of intelligent pipe caching in the future. — Cetin Sert, Sep 02 '12 at 01:10

score 13 · Accepted Answer · answered Apr 09 '12 at 23:13

13

I don't know Haskell, but "resource temporarily unavailable" is EAGAIN.

And it looks like Haskell sets its sockets to non-blocking mode by default. So if you try to read from one when there is no data, or try to write to one when its buffer is full, you will fail with EAGAIN.

Figure out how to change the sockets to blocking mode, and I bet you will solve your problem.

[update]

Alternatively, call select or poll before attempting to read or write the socket. But you still need to handle EAGAIN, because there are rare corner cases where Linux select will indicate a socket is ready when actually it isn't.

answered Apr 09 '12 at 23:13

Nemo

70,042
10
116
153

Thanks for the hint! I am on to seeing how far it helps :) – Cetin Sert Apr 09 '12 at 23:14
5

Wow you don't know Haskell yet you can pinpoint the exact line to help solve my problem, amazing answer!! – Cetin Sert Apr 09 '12 at 23:38
It was setting the non-blocking mode appropriately around the `splice` calls that solved the problem. I am truly thankful for your classy answer. :) – Cetin Sert Apr 10 '12 at 00:20

score 0 · Answer 2 · answered Apr 09 '12 at 23:27

0

Would the sendfile() syscall work for you? If so you can use the sendfile package.

answered Apr 09 '12 at 23:27

tibbe

8,809
7
36
64

1

Thanks. According to this: http://kerneltrap.org/node/6505 `splice` is the right way to go if both sides are sockets. Am I wrong? Btw Nemo's answer solved the problem for me so I will stick to my implementation of `splice`! – Cetin Sert Apr 09 '12 at 23:37
The commented out section in my code contains a portable piece of user space code using `mallocBytes`, `hGetBufSome` and `hPut` which perform better than `network-bytestring`'s `recv`, `sendAll` as well. I will polish it up and put it on Hackage soon enough. I am working on a high-performance, extremely simple, small and clean proxy application and one of its design principles is: **do not depend on external packages unless absolutely necessary** and although `sendfile`'s dependency list is way too clean compared to most other packages, I have seen rival proxy software also using `splice`. :) – Cetin Sert Apr 09 '12 at 23:51
1

`sendfile` sends a file. It does not work if the source is not a file or if the destination is not a socket. `splice` is what you need for file-to-file or socket-to-socket zero-copy. (Although it is usually not really zero-copy... long story) – Nemo Apr 10 '12 at 00:35
@Nemo: would love to hear the full story if you had links to its lengthier versions. My current understanding is that copying is still occuring but kept tightly somewhere in the kernel space rather than occuring across the kernel - user space boundary. – Cetin Sert Apr 10 '12 at 03:07
1

@Cetin: I don't have an actual reference, but that is essentially correct. `splice` (and `sendfile`) are only zero-copy when sending from a file to a socket; the kernel will just pass references to the page cache and have the network card DMA directly from there. In all other cases at least one copy is made, but you are correct that this happens in kernel space and is thus probably faster. (Although userspace read+write using a small buffer -- i.e. via the L1 cache -- can be pretty fast too. As is often the case, it is easier to measure than to predict.) – Nemo Apr 10 '12 at 17:06
@Nemo as soon as I get the proxy running with the new library as well on Windows as it now does on Linux, I can spend some time to see if performance numbers pay the effort :) – Cetin Sert Apr 10 '12 at 18:32

Using GNU/Linux system call `splice` for zero-copy Socket to Socket data transfers in Haskell

2 Answers2

Linked