Questions tagged [distributed-filesystem]

Any file system that allows access to files from multiple hosts sharing via a computer network making it possible for multiple users on multiple machines to share files and storage resources.

56 questions
167
votes
13 answers

FileSystemWatcher vs polling to watch for file changes

I need to setup an application that watches for files being created in a directory, both locally or on a network drive. Would the FileSystemWatcher or polling on a timer would be the best option. I have used both methods in the past, but not…
Jon Tackabury
  • 47,710
  • 52
  • 130
  • 168
28
votes
5 answers

Lustre, Gluster or MogileFS?? for video storage, encoding and streaming

So many options and so little time to test them all... I wonder if someone has experiences with distributed file systems for video streaming and storage/encoding. I have a lot of huge video files (50GB to 250GB) that I need to store somewhere, be…
Horacio
  • 2,727
  • 5
  • 26
  • 29
8
votes
1 answer

Sharding vs DFS

As far as I understand sharding (e.g in MongoDB) and distributed file systems (e.g. HDFS in HBase or HyperTable) are different mechanisms that databases use to scale-out, however I wonder how do they compare?
Ali Shakiba
  • 20,549
  • 18
  • 61
  • 88
8
votes
2 answers

Object storage for a web application

I am currently working on a website where, roughly 40 million documents and images should be served to it's users. I need suggestions on which method is the most suitable for storing content with subject to these requirements. System should be…
6
votes
1 answer

How do you read and write from/into different ElasticSearch clusters using spark and elasticsearch-hadoop?

Original title: Besides HDFS, what other DFS does spark support (and are recommeded)? I am happily using spark and elasticsearch (with elasticsearch-hadoop driver) with several gigantic clusters. From time to time, I would like to pull the entire…
6
votes
1 answer

Distributed key-value storage for total data size of 80TB

TL;DR: I'd like to have recommendations for a distributed key-value storage, for avg. entry size of up to 50KB, to be installed on a Linux environment (dedicated servers). A file-system solution would do. I found a few solutions: Ceph, Cassandra,…
Ron Klein
  • 9,178
  • 9
  • 55
  • 88
5
votes
1 answer

What's the best way of letting people upload files in an AWS load balanced environment?

Let's say you have instance1, instance2, and instance3 running in AWS. They are all running Apache, and the web application that you run needs to allow users to upload images which is the case in many projects. Also when you are showing the image…
4
votes
1 answer

What is a Content Delivery Network and Distributed File System?

I am trying to widen my knowledge with respect to distributed systems and systems design. I came across the terms such as Content Delivery Network and Distributed File Systems for storing/handling media data such as music, videos, pictures, gifs,…
4
votes
2 answers

Obtain the DFS path of a network location in Python

I want to obtain a ping-like response from a Windows network location that has a Distributed File System architecture e.g. path = r'\\path\to\some\shared\folder_x' delay = ping_func(path) print delay # return response in milliseconds ? 234 Once I…
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100
4
votes
1 answer

CoreOS & HDFS - Running a distributed file system in Linux Containers/Docker

I need some sort of distributed file system running on a CoreOS cluster. As such I'd like to run HDFS on CoreOS nodes. Is this possible? I can see 2 options; Expand CoreOS - Install HDFS directly onto CoreOS - not ideal as it breaks the whole…
NightWolf
  • 7,694
  • 9
  • 74
  • 121
4
votes
1 answer

Writing files locally vs. remote file system?

My question is about remote files systems on Windows. Suppose you have workstation X which has access to files systems on the network - say - \\ServerY\MYDir\. Imagine a scenario that you have two simultaneous threads on X. Thread 1 is writing a…
3
votes
0 answers

How to compute the distance matrix in pyspark?

I have a dataset with 1,00,000 records. I need to find euclidean distance matrix for this dataset. It should create 1,00,000*1,00,000 matrix. In python we have squareform(pdist(x)). As i cannot perform the same function on the rdd, How to do it on…
3
votes
4 answers

IPFS file not downloading

I am using ipfs to share files with other users, I upload it and then stop the ipfs daemon. So, ideally the file should have been distributed across other peers and downloadable to the recipient user irrespective of the fact whether the sender…
Mahesh H Viraktamath
  • 818
  • 3
  • 14
  • 34
3
votes
2 answers

Distributed file systems supported by Python/Dask

Which distributed file systems are supported by Dask? Specifically, from which file systems one could read dask.dataframe's? From the Dask documentation I can see that HDFS is certainly supported. Are any other distributed file systems supported,…
S.V
  • 2,149
  • 2
  • 18
  • 41
3
votes
0 answers

Java client for XtreemFS

I am building a Java web application which will require storing and retrieving large files. I would like to be able to scale the application for the future so I was planning on using XtreemFS http://www.xtreemfs.org/ as a distributed file system. It…
troymass
  • 1,022
  • 3
  • 11
  • 24
1
2 3 4