1

I was studying Go from "Network programming with Go" by Jan Newmarch, and I noticed that almost all his examples involve [512]byte as a buffer for writing and reading to a connection.

I tried to search online but failed to get an answer. I suspect it might have something to do with i/o, but am not sure what is the exact reason behind this design.

Could anyone elaborate a bit on the choice of the buffer?

Some sample codes from the book:

func handleConn(c net.Conn){ 
    defer c.Close()

    var buf [512]byte 

    for{ 
        n, err := c.Read(buf[0:])

        if err != nil{ return }

        _, err2 := c.Write(buf[0:]) 

        if err2 != nil{ 
          return 
         }
    }
 }
Xu Chen
  • 407
  • 4
  • 10
  • It's a convenient size. You you mean why 512 instead of something like 500? – JimB Jun 10 '16 at 15:27
  • Yes, but more like instead of 256? or anything power of 2 and relatively large. – Xu Chen Jun 10 '16 at 15:34
  • 1
    Found a good thread about that: http://stackoverflow.com/questions/236861/how-do-you-determine-the-ideal-buffer-size-when-using-fileinputstream – molivier Jun 10 '16 at 15:37
  • 4
    It's just an arbitrary value. For example, the `bufio.Reader` default is 4096, which is used throughout the http package. – JimB Jun 10 '16 at 15:39
  • @molivier: that's specifically about the filesystem, and is less applicable to the network. – JimB Jun 10 '16 at 15:40
  • 2
    @JimB Yes sorry I paste the wrong link: http://stackoverflow.com/questions/2811006/what-is-a-good-buffer-size-for-socket-programming – molivier Jun 10 '16 at 15:47
  • The key is that the buffer size should strike a balance between memory size and expected read/write size. The entire point of that type of buffer is to avoid allocations for small reads and/or writes, so if it's sufficiently large that most of your read/writes fit in it, you avoid a lot of extraneous (and unnecessary) allocations of memory. An argument could also be made that such a buffer should ideally fit in a cache line on the expected environment. – Kaedys Jun 10 '16 at 16:07

1 Answers1

1

Not a direct answer but some background in addition to what other folks stated in comments.

The Go types which wrap files and sockets are relatively thin in the sense any Read() and Write() call done on them results in a syscall being performed (with the sockets, it's more tricky as they use async I/O via the system-provided poller such as epoll, kqueue, IOCP etc). This means that reading from a file or network by chunks of 1 byte is woefully ineffective.

To consider another extreme, it's possible to allocate, say, a 100MiB buffer and attempt to pass it to Read(). While the kernel's syscall will happily accept a destination of that size, it should be noted that contemporary OSes have internal buffers on network sockets of around 64KiB1 in size so your Read() call under most circumstances will return having read just that much data or less. This means you will be wasting most of your buffer space.

Now comes another set of considerations: what pattern of reading data from a socket your application has?

Say, when you're streaming data from a socket to an open file, you don't really care about buffering (you'd like it would be someone else's decision to pick the "right" size). For this case, just use io.Copy() (which currently (Go 1.6) uses an internal buffer of 32KiB).

Conversely, if you're parsing some application-level protocol utilizing TCP as its transport, you often need to read the data in chunks of arbitrary fixed size. For this case, the best pattern is wrapping the socket in a bufio.Reader — to combat the "small reads" problem outlined above — and then use io.ReadFull() to read data into your local arrays/slices of the size you need (if possible, do reuse your arrays and slices to lower the pressure on the garbage collector).

Another case is text-based "linewise" protocols such as SMTP or HTTP. In these protocols, the maximum line length is typically fixed, and it makes sense to use buffers of the maximum size of the protocol's line to deal with them. (But anyway, to deal with such protocols, it's best to use the net/textproto standard package.)

As to your question per se, my stab at it is that 512 is just a beautiful number which has no special meaning. When you write a book like this you have to pick some value anyway.

As you can see from my descriptions of real-work patterns of reading from the network, most of the time you simply have no business dealing with buffering — let the standard tools do this for you. You should only resort to thinking about tuning that stuff when you're facing a real problem with the defaults provided by the standard packages.

TL;DR

  • The book you're reading merely explains you the basic concepts, so it has to use some number.
  • Real-world code seems to use other numbers when it needs to buffer (usually higher)…
  • …but you should not concern you with these numbers until absolutely necessary: use the ready-made tools where possible.

1 Of course, I can't say for all operating systems, and they have different knobs to tweak this stuff, and "contemporary" might start to mean different things in over a year or less, you know… still I consider my estimation as being quite close to the truth.

kostix
  • 51,517
  • 14
  • 93
  • 176
  • The bytes package in go has a constant called MinRead that is set to 512, so it seems like it does have some meaning. https://golang.org/pkg/bytes/?m=all#pkg-constants – Seth Pollack Nov 24 '16 at 02:59
  • @SethPollack, IMO that's a red herring: from the description, that constant appears to serve a heuristic used to prevent premature growing of the buffer `ReadFrom()` reads the data into. You know... in the end, the only "magic" thing about 512 -- as opposed to, say, 511 or 516, -- is that it's 2^9. 512 is exactly half of a kilobyte (a kibibyte to be precise), or twice as much as 256 which is 2x128, which is 2x64, which is 2x32, which is 2x16, which is 2x8 etc ;-) – kostix Nov 24 '16 at 06:42
  • @SethPollack, I mean, for those who started to dabble with programming back in those days when it usually meant learning some assembly language and machine codes, 512 has that warm cozy feeling. ;-) That's part of the reason such numbers are used. The other -- more serious consideration -- is that on some architectures, memory access is only effective (or even possible!) when the starting address of a memory block of a given size is naturally *aligned*. Say, to read a 64-bit word, it has to be located at an address wholly divisible by 8. – kostix Nov 24 '16 at 06:47
  • @SethPollack, hence operating on memory blocks of such "even" sizes like 512 (which contains an exact/whole number of the H/W architecture words -- be them 16- or 32- or 64- or 128-bit in size) could allow the compiler to generate the code which accesses that memory in the most effective way possible. – kostix Nov 24 '16 at 06:49