33

I have a lot of spare intel linux servers laying around (hundreds) and want to use them for a distributed file system in a web hosting and file sharing environment. This isn't for a HPC application, so high performance isn't critical. The main requirement is high availability, if one server goes offline, the data stored on it's hard drives is still available from other nodes. It must run over TCP/IP and provide standard POSIX file permissions.

I've looked at the following:

  • Lustre (http://wiki.lustre.org/index.php?title=Main_Page): Comes really close, but it doesn't provide redundancy for data on a node. You must make the data HA using RAID or DRBD. Supported by Sun and Open Source, so it should be around for a while

  • gfarm (http://datafarm.apgrid.org/): Looks like it provides the redundancy but at the cost of complexity and maintainability. Not as well supported as Lustre.

Does anyone have any experience with these or any other systems that might work?

Eric
  • 321
  • 1
  • 6
  • 17

7 Answers7

21

check also GlusterFS

Edit (Aug-2012): Ceph is finally getting ready. Recently the authors formed Inktank, an independent company to sell commercial support for it. According to some presentaions, the mountable POSIX-compliant filesystem is the uppermost layer and not really tested yet, but the lower layers are being used in production for some time now.

The interesting part is the RADOS layer, which presents an object-based storage with both a 'native' access via the librados library (available for several languages) and an Amazon S3-compatible RESP API. Either one makes it more than adequate for adding massive storage to a web service.

This video is a good description of the philosophy, architecture, capabilities and current status.

Javier
  • 60,510
  • 8
  • 78
  • 126
  • 1
    I was disappointed by glusterfs performance / reliability under heavy IO loads. – Omry Yadan May 29 '12 at 11:18
  • Can you please share what "heavy IO loads" mean? how many IOPS? – David Rabinowitz Jul 29 '12 at 11:33
  • What happens if a node falls out? I'm curious about a "gluster" like setup, where the cluster can contribute data (for redundancy, or for additional storage, at the server's choice), and disconnect whenever it wants without destroying the "raid array". – isaaclw Oct 12 '12 at 16:13
  • Having used it extensively, I would describe the POSIX filesystem layer of ceph as experimental and horribly buggy, FYI. – Paul Wheeler Nov 28 '12 at 19:47
  • 1
    @PaulWheeler: I concur. what i wanted to note is that other non-fs-like layers (RADOS, rdb) are getting quite reliable. For POSIX compatibility, it seems MooseFS is much better. I'd love to see ceph-fs mature, since rdb is quite desirable to have in the same cluster... – Javier Nov 28 '12 at 22:15
  • IMHO Ceph is nowhere near to be ready and certainly not going to become ready in the nearby future. I recommend to avoid Ceph as it have too many issues to be used with reasonable degree of confidence. – Onlyjob Dec 23 '14 at 00:14
  • I also had a pretty bad experience with glusterfs during heavy IO load, some times the server would crash, and the files starts disappearing. – Aftab Naveed Feb 12 '15 at 05:48
5

In my opinion, the best file system for Linux is MooseFS , it's quite new, but I had an opportunity to compare it with Ceph and Lustre and I say for sure that MooseFS is the best one.

  • 1
    Agreed with correction: MooseFS is now proprietary so its successor [LizardFS](http://lizardfs.org) is the best IMHO. – Onlyjob Dec 23 '14 at 00:17
  • @Onlyjob - MooseFS is no longer proprietary – warren Apr 14 '15 at 16:55
  • Technically speaking. But it does not have public VCS nor bug tracker. What if author take down source archive and provide it by request *again*? LizardFS already has community behind it and (unlike MooseFS) LizardFS will be in Debian soon. LizardFS is unrestricted (i.e. no "community edition" etc.). – Onlyjob Apr 15 '15 at 02:51
  • 1
    MooseFS source code is available on GitHub: https://github.com/moosefs/moosefs – Jakub Jul 14 '17 at 08:25
4

Gluster is getting quite a lot of press at the moment:

http://www.gluster.org/

George
  • 59
  • 1
  • 1
  • 3
    @dpavlin - does it matter if it's a duplicate? Yes, the answerer shouldn't have added it since it was already there, but downvoting just because it's a duplicate seems wrong – warren Jul 20 '11 at 14:17
  • Glusterfs is fat, eats lots of memory during high IO load, and very slow. – Aftab Naveed Feb 12 '15 at 05:48
2

Lustre has been working for us. It's not perfect but it's the only thing we have tried that has not broken down over load. We still get LBUGS from time to time and dealing with 100TB + file systems is never easy but the Lustre system has worked and increased both performance and availability.

2

If not someone forces you to use it, I would also highly recommend using anything else than Lustre. From what I hear from others and what also gave myself nightmares for quite some time is the fact that Lustre quite easily breaks down in all kinds of situations. And if only a single client in the system breaks down, it puts itself into an endless do_nothing_loop mode typically while holding some important global lock - so the next time another client tries to access the same information, it will also hang. Thus, you often end up rebooting the whole cluster, which I guess is something you would try to avoid normally ;)

Modern parallel file systems like FhGFS (http://www.fhgfs.com) are way more robust here and also allow you to do nice things like running server and client components on the same machines (though built-in HA features are still under development, as someone from their team told me, but their implementation is going to be pretty awesome from what I've heard).

kurtenbach
  • 21
  • 1
0

Ceph looks to be a promising new-ish entry into the arena. The site claims it's not ready for production use yet though.

kbyrd
  • 3,321
  • 27
  • 41
0

I read a lot about distributed filesystems and I think FhGFS is the best.

http://www.fhgfs.com/

It worth a try. See more about it at:

http://www.fhgfs.com/wiki/