21

read and sysread have very similar documentation. What are the differences between the two?

ikegami
  • 367,544
  • 15
  • 269
  • 518

1 Answers1

27

About read:

  • read supports PerlIO layers.
  • read works with any Perl file handle[1].
  • read buffers.
  • read obtains data from the system in fixed sized blocks of 8 KiB[2].
  • read may block if less data than requested is available[3].

About sysread:

  • sysread doesn't support PerlIO layers (meaning it requires a raw a.k.a. binary handle).
  • sysread only works with Perl file handles that map to a system file handle/descriptor[4].
  • sysread doesn't buffer.
  • sysread performs a single system call.
  • sysread returns immediately if data is available to be returned, even if the amount of data is less than the amount requested.

Summary and conclusions:

  • read works with any Perl file handle, while sysread is limited to Perl file handles mapped to a system file handle/descriptor.
  • read isn't compatible with select[5], while sysread is compatible with select.
  • read can perform decoding for you, while sysread requires that you do your own decoding.
  • read should be faster for very small reads, while sysread should be faster for very large reads.

Notes:

  1. These include, for example, tied file handles and those created using open(my $fh, '<', \$var).

  2. Before 5.14, Perl read in 4 KiB blocks. Since 5.14, the size of the blocks is configurable when you build perl, with a default of 8 KiB.

  3. In my experience, read will return exactly the amount requested (if possible) when reading from a plain file, but may return less when reading from a pipe. These results are by no means guaranteed.

  4. fileno returns a non-negative number for these. These include, for example, handles that read from plain files, from pipes and from sockets, but not those mentioned in [1].

  5. I'm referring to the 4-argument one called by IO::Select.

ikegami
  • 367,544
  • 15
  • 269
  • 518
  • 1
    Great summary. - should be in perlfunc. This: "`read` should be faster for small reads, while `sysread` should be faster for large reads." is exactly what is needed. Of course, given the infinite possibilities of the real word, it may not **always** be true but a mostly truthy perlish guideline is what I want. – G. Cito Mar 30 '16 at 17:10
  • 1
    In a response to [another question](http://stackoverflow.com/a/36208336/2019415) I used [`Stream::Reader`](https://metacpan.org/pod/Stream::Reader). As an experiment I replaced `read` with `sysread` in `Reader.pm` and gained 9-10% throughput - it seemed too easy. Besides the obvious bits (buffering, encoding,) is it just a question of benchmarking and testing? Can you speak to any data integrity, failover/robustness elements of this? – G. Cito Mar 30 '16 at 17:19
  • 1
    @G.Cito in reusable code such as Stream::Reader, you have to assume filehandles may have layers, so sysread is not an option. – ysth Mar 30 '16 at 20:10
  • 1
    @G. Cito, Talk of "UTF-8 mode" implies it's not just a possibility that they have layers, but that it's a supported mode of operation. That prevents `sysread` from being a valid option. – ikegami Mar 30 '16 at 20:34
  • 3
    Also, you can `read` from things that aren't actually files (perhaps you opened a filehandle to a scalarref or `TIEHANDLE`d something), but you can only `sysread` something with a positive `fileno()`. – hobbs Mar 30 '16 at 20:51
  • @ikegami: How about `write` and `syswrite`? – cuonglm Apr 01 '16 at 07:25
  • @cuonglm, They're not even similar. I think you mean `print` and `syswrite`. Of the two, I've only ever used `print` because it's easier to use. I don't know if there are any other differences. – ikegami Apr 04 '16 at 15:52