6

This questions continues from what I learnt from my question yesterday titled using git to distribute nightly builds.

In the answers to the above questions it was clear that git would not suit my needs and was encouraged to re-examine using BitTorrent.


Short Version

Need to distribute nightly builds to 70+ people each morning, would like to use git BitTorrent to load balance the transfer.

Long Version

NB. You can skip the below paragraph if you have read my previous question.

Each morning we need to distribute our nightly build to the studio of 70+ people (artists, testers, programmers, production etc). Up until now we have copied the build to a server and have written a sync program that fetches it (using Robocopy underneath); even with setting up mirrors the transfer speed is unacceptably slow with it taking up-to an hour or longer to sync at peak times (off-peak times are roughly 15 minutes) which points to being hardware I/O bottleneck and possibly network bandwidth.

What I know so far

What I have found so far:

  • I have found the excellent entry on Wikipedia about the BitTorrent protocol which was an interesting read (I had only previously known the basics of how torrents worked). Also found this StackOverflow answer on the BITFIELD exchange that happens after the client-server handshake.

  • I have also found the MonoTorrent C# Library (GitHub Source) that I can use to write our own tracker and client. We cannot use off the shelf trackers or clients (e.g. uTorrent).

Questions

In my initial design, I have our build system creating a .torrent file and adding it to the tracker. I would super-seed the torrent using our existing mirrors of the build.

Using this design, would I need to create a new .torrent file for each new build? In other words, would it be possible to create a "rolling" .torrent where if the content of the build has only change 20% that is all that needs to be downloaded to get latest?

... Actually. In writing the above question, I think that I would need to create new file however I would be able download to the same location on the users machine and the hash will automatically determine what I already have. Is this correct?

In response to comments

  1. For completely fresh sync the entire build (including: the game, source code, localized data, and disc images for PS3 and X360) ~37,000 files and coming in just under 50GB. This is going to increase as production continues. This sync took 29 minutes to complete at time when there is was only 2 other syncs happening, which low-peak if you consider that at 9am we would have 50+ people wanting to get latest.

  2. We have investigated the disk I/O and network bandwidth with the IT dept; the conclusion was that the network storage was being saturated. We are also recording statistics to a database of syncs, these records show even with handful of users we are getting unacceptable transfer rates.

  3. In regard not using off-the-shelf clients, it is a legal concern with having an application like uTorrent installed on users machines given that other items can be easily downloaded using that program. We also want to have a custom workflow for determining which build you want to get (e.g. only PS3 or X360 depending on what DEVKIT you have on your desk) and have notifications of new builds available etc. Creating a client using MonoTorrent is not the part that I'm concerned about.

Community
  • 1
  • 1
Dennis
  • 20,275
  • 4
  • 64
  • 80
  • 1
    What is the size of the files you distribute ? Have you tried a good compression ? You may also use a binary diff tool against the previous version, the patch which will be enough for almost everybody (others will download the full package). – Guillaume Sep 08 '11 at 08:03
  • 1
    Are you sure changing the protocol/tool will fix the problem? Have you done any real math about what you're trying to distribute on your network compared to your hardware, network bandwith, etc... For exampel, have you checked file system system cache (cf: http://blogs.technet.com/b/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx)? – Simon Mourier Sep 08 '11 at 08:08
  • i can't really see why you can't use off the shelf clients, are you running inhouse web browsers and word processors too? – grapefrukt Sep 08 '11 at 09:14
  • Updated question with replies to comments. – Dennis Sep 08 '11 at 10:05
  • What about using e-mule out-of-the-box for that? – Daniel Mošmondor Sep 09 '11 at 06:34

4 Answers4

6

To the question whether or not you need to create a new .torrent, the answer is: yes.

However, depending a bit on the layout of your data, you may be able to do some simple semi-delta-updates.

If the data you distribute is a large collection of individual files, with each build some files may have changed you can simply create a new .torrent file and have all clients download it to the same location as the old one (just like you suggest). The clients would first check the files that already existed on disk, update the ones that had changed and download new files. The main drawback is that removed files would not actually be deleted at the clients.

If you're writing your own client anyway, deleting files on the filesystem that aren't in the .torrent file is a fairly simple step that can be done separately.

This does not work if you distribute an image file, since the bits that stayed the same across the versions may have moved, and thus yielding different piece hashes.

I would not necessarily recommend using super-seeding. Depending on how strict the super seeding implementation you use is, it may actually harm transfer rates. Keep in mind that the purpose of super seeding is to minimize the number of bytes sent from the seed, not to maximize the transfer rate. If all your clients are behaving properly (i.e. using rarest first), the piece distribution shouldn't be a problem anyway.

Also, to create a torrent and to hash-check a 50 GiB torrent puts a lot of load on the drive, you may want to benchmark the bittorrent implementation you use for this, to make sure it's performant enough. At 50 GiB, the difference between different implementations may be significant.

Arvid
  • 10,915
  • 1
  • 32
  • 40
3

Just wanted to add a few non-BitTorrent suggestions for your perusal:

  • If the delta between nightly builds is not significant, you may be able to use rsync to reduce your network traffic and decrease the time it takes to copy the build. At a previous company we used rsync to submit builds to our publisher, as we found our disc images didn't change much build-to-build.

  • Have you considered simply staggering the copy operations so that clients aren't slowing down the transfer for each other? We've been using a simple Python script internally when we do milestone branches: the script goes to sleep until a random time in a specified range, wakes up, downloads and checks-out the required repositories and runs a build. The user runs the script when leaving work for the day, when they return they have a fresh copy of everything ready to go.

Blair Holloway
  • 15,969
  • 2
  • 29
  • 28
2

You could use BitTorrent sync Which is somehow an alternative to dropbox but without a server in the cloud. It allows you to synchronize any number of folders and files of any size. with several people and it uses the same algorithms from the bit Torrent protocol. You can create a read-only folder and share the key with others. This method removes the need to create a new torrent file for each build.

JuanMa Cuevas
  • 1,162
  • 9
  • 22
  • I've only just read about sync on `\.` and how in the last 6 months it has transferred 1PB of data. However, it did not immediately occur to me that I could use for this purpose. Thanks! – Dennis May 07 '13 at 14:26
0

Just to throw another option into the mix, have you considered BITS? Not used it myself but from reading the documentation it supports a distributed peer caching model which sounds like it will achieve what you want.

The downside is that it is a background service so it will give up network bandwidth in favour of user initiated activity - nice for your users but possibly not what you want if you need data on a machine in a hurry.

Still, it's another option.

MarcE
  • 3,586
  • 1
  • 23
  • 27
  • Thanks for the suggestion. We had a look at BITS (Background Intelligent Transfer Service) and will perhaps use that as a short-term solution. – Dennis Sep 08 '11 at 09:14
  • 1
    BITS works great as a background downloader **BUT** According to the documentation: _"BITS 3.0: Starting with Windows 7, the BITS 3.0 peer caching model is deprecated. If BITS 4.0 is installed, the BITS 3.0 peer caching model is unavailable. For more information, see Peer Caching."_ – Ian Mercer Sep 09 '11 at 05:34
  • @Hightechrider: Thanks for additional information about BITS caching model. – Dennis Sep 09 '11 at 06:20