2

I'm trying to decrease the time Spark using to read and write data by using Alluxio.

But I found that I have to specify the path to read data.

I've found that I can use metatool of Hive to change Hive's warehouse from HDFS to Alluxio, so I can write data to Alluxio by Spark sql. But I don't know how to read Alluxio's data by sql.

Is there any way to read/write Alluxio's data just like Hive? Maybe read Alluxio's metadata and add it to metastore?

dtolnay
  • 9,621
  • 5
  • 41
  • 62
lulijun
  • 415
  • 3
  • 22
  • Alluxio supports the Hadoop FileSystem API, so you should be able to read data from Alluxio exactly how you read it from HDFS. Can you explain what you're doing to read the data from Alluxio through Spark sql, and what issues you're running into? – AAudibert Jan 25 '18 at 22:18

1 Answers1

1

All you need to do is to modify the table location in Spark's metastore.

You can check Alluxio for details, if the table location alter takes too long, check this thread for help.

Note that first time you query that table, Alluxio will fetch data from UFS. After the data is stored in Alluxio, your future table query will directly read data from Alluxio.

Eugene
  • 10,627
  • 5
  • 49
  • 67