Questions tagged [alluxio]

Alluxio is an open source memory-centric distributed file system written in Java. It acts as an in-memory data caching layer between applications and data storage systems. The software is published under the Apache License.

Alluxio (formerly Tachyon) is an open source memory-speed distributed file system. It is a data layer between compute and storage, abstracting the files or objects in underlying persistent storage systems and providing a shared data access layer for compute applications. Alluxio was developed in University of California, Berkeley AMPLab.

Alluxio can be used as a distributed shared caching service for big data analytics like mapreduce, apache-spark, etc, so that compute applications talking to Alluxio can transparently cache frequently accessed data, especially data from remote locations, to provide in-memory I/O throughput

Alluxio can also simplify cloud and object storage adoption: Cloud and object storage systems use different semantics that have performance implications compared to traditional file systems. For example, when accessing data in cloud storage there is no node-level locality or cross-application caching. There are also different performance characteristics in common file system operations like directory listing (‘ls’) and ‘rename’, which often add significant overhead to analytics. Deploying Alluixo with cloud or object storage can close the semantics gap and achieve significant performance gains.

Alluxio is written in java and hosted on github.

The latest stable version:

Alluxio 1.8.1 - Sept 27, 2018

Recommended reference sources:

90 questions

votes

2 answers

Errors when using OFF_HEAP Storage with Spark 1.4.0 and Tachyon 0.6.4

I am trying to persist my RDD using off heap storage on spark 1.4.0 and tachyon 0.6.4 doing it like this : val a = sqlContext.parquetFile("a1.parquet") a.persist(org.apache.spark.storage.StorageLevel.OFF_HEAP) a.count() Afterwards I am getting the…

apache-spark apache-spark-sql alluxio

asked May 06 '15 at 20:37

qwertz1123

1,173
10
27

votes

1 answer

Is Tachyon by default implemented by the RDD's in Apache Spark?

I'm trying to understand Spark's in memory feature. In this process i came across Tachyon which is basically in memory data layer which provides fault tolerance without replication by using lineage systems and reduces re-computation by…

apache-spark bigdata rdd in-memory-database alluxio

asked Apr 22 '15 at 13:53

Himanshu Mehra

votes

1 answer

Resources/Documentation on how does the failover process work for the Spark Driver (and its YARN Container) in yarn-cluster mode

I'm trying to understand if the Spark Driver is a single point of failure when deploying in cluster mode for Yarn. So I'd like to get a better grasp of the innards of the failover process regarding the YARN Container of the Spark Driver in this…

apache-spark hadoop hadoop-yarn alluxio

asked Jan 18 '15 at 12:29

MiguelPeralvo

votes

1 answer

Spark Tachyon: How to delete a file?

In Scala, as an experiment I create a sequence file on Tachyon using Spark and read it back in. I want to delete the file from Tachyon using the Spark script also. val rdd = sc.parallelize(Array(("a",2), ("b",3),…

scala apache-spark alluxio

asked Jul 19 '14 at 02:45

bjjer

votes

0 answers

Spark concurrency performance issue Vs Presto

We are benchmarking spark with alluxio and presto with alluxio. For evaluating the performance we took 5 different queries (with some joins, group by and sort) and ran this on a dataset 650GB in orc. Spark execution environment is setup in such a…

apache-spark presto trino alluxio

asked May 02 '18 at 04:30

Rijo Joseph

1,375
3
17
33

votes

1 answer

Alluxio Error:java.lang.IllegalArgumentException: Wrong FS

I am able to run wordcount on alluxio with an example jar provided by cloudera, using: sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount -libjars…

hadoop mapreduce hdfs cloudera-cdh alluxio

asked Dec 23 '16 at 04:43

Sambhu R

votes

1 answer

What's the difference between Apache Ignite and Tachyon

I am new to Apache Ignite，for the Ignite and spark integration， it looks that Ignite provides an in-memory layer that the data will live across spark applications, which is the capability that Tachyon provides as an in-memory File System. So, my…

apache-spark ignite alluxio

asked Dec 06 '16 at 07:52

Tom

5,848
12
44
104

votes

1 answer

How to Tachyon to share data between Spark jobs

I'm a beginner with Tachyon. I want to share some data or rdd between spark jobs. Tachyon overview says Tachyon is an open source memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster jobs. But I…

apache-spark alluxio

asked Jun 26 '16 at 14:10

starrynight92

votes

2 answers

What is difference between distributed cache and Tachyon?

Distributed cache is a method that store common requests and enabling quick retrieval. Tachyon is a memory-centric distributed storage file system that avoids going to disk to load datasets that are frequently read. What is the different between…

apache-spark distributed-caching distributed-cache alluxio

asked Sep 16 '15 at 07:59

Venu A Positive

2,992
2
28
31

votes

2 answers

How to convert spark RDD to mahout DRM?

I am fetching data from Alluxio in Mahout using sc.textFile(), but it is spark RDD. My program further uses this spark RDD as Mahout DRM, therefore I needed to convert RDD to DRM. So my current code remains stable.

apache-spark mahout alluxio

asked Apr 07 '17 at 05:16

user2738965

votes

1 answer

Why do mtime and atime need to be updated?

Does anyone know why the mtime and atime need to be updated when completing the file? mInodeTree.updateInode(rpcContext, UpdateInodeEntry.newBuilder() .setId(inode.getId()) .setUfsFingerprint(ufsFingerprint) …

alluxio

asked May 26 '22 at 22:38

ChanChan Mao

votes

1 answer

The difference between invoke maven directly in shell and invoke it from intellij IDEA

Edit 3: I also tried to set maven proxy through java option parameters mentioned at this thread. Edit 2: I'm sure intellij idea are using same settings.xml, same maven binary and the same local repository as system maven. Edit 1: I tried to…

java maven intellij-idea alluxio

asked Jan 01 '20 at 09:59

Eugene

10,627
5
49
67

votes

1 answer

Hive: modify external table's location take too long

Hive has two kinds of tables which are Managed and External Tables, for the difference, you can check Managed. VS External Tables. Currently, to move external database from HDFS to Alluxio, I need to modify external table's location to…

hadoop hive bigdata alluxio

asked Aug 26 '19 at 07:54

Eugene

10,627
5
49
67

votes

1 answer

Spark job failed to write to Alluxio due to DeadlineExceededException

I am running a Spark job writing to an Alluxio cluster with 20 workers (Alluxio 1.6.1). Spark job failed to write its output due to alluxio.exception.status.DeadlineExceededException. The worker is still alive from Alluxio WebUI. How can I avoid…

apache-spark alluxio

asked Nov 15 '18 at 18:20

apc999

votes

1 answer

Alluxio with/without HDFS

I have a cluster with HDFS as an under storage distributed file system, but I've just read about alluxio that is fast and flexible. So, My question is: Should I use Alluxio with HDFS or Alluxio is alternative for HDFS? (I see in their site that…

hadoop hdfs distributed-filesystem alluxio

asked Aug 30 '18 at 13:49

DAVID_ROA

2 3 4 5 6 Next