sparklyr write data to hdfs or hive

Question

I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are running on two different servers , thus I need a way to write to a remote hdfs from R .

Regards Rahul

Have you tried to run Spark in yarn mode? [This](https://stackoverflow.com/questions/38102921/can-sparklyr-be-used-with-spark-deployed-on-yarn-managed-hadoop-cluster) post might be helpful. — michalrudko, Jun 27 '17 at 22:41

score 6 · Answer 1 · edited Feb 01 '18 at 16:29

6

Writing Spark table to hive using Sparklyr:

iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
sdf_copy_to(sc, iris_spark_table)
DBI::dbGetQuery(sc, "create table iris_hive as SELECT * FROM iris_spark_table")

edited Feb 01 '18 at 16:29

Eric Aya

69,473
35
181
253

answered Feb 01 '18 at 16:11

Jeereddy

1,050
8
16

1

thanks for sharing. this loads the data into hive's default database. do you know how to change the hive database for which to export the file? – bshelt141 Feb 26 '18 at 21:37
@bshelt141 You can use syntax `database.table` in the SQL passed to `DBI`. – Konrad Jul 26 '19 at 15:02

score 5 · Answer 2 · answered Aug 16 '18 at 23:24

As of latest sparklyr you can use spark_write_table. pass in the format database.table_name to specify a database

iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
spark_write_table(
  iris_spark_table, 
  name = 'my_database.iris_hive ', 
  mode = 'overwrite'
)

Also see this SO post here where i got some input on more options

score 0 · Answer 3 · answered Sep 05 '17 at 13:52

0

You can use sdf_copy_to to copy a dataframe into Spark, lets say tempTable. Then use DBI::dbGetQuery(sc, "INSERT INTO TABLE MyHiveTable SELECT * FROM tempTable") to insert the dataframe records in a hive table.

answered Sep 05 '17 at 13:52

edeg

76
2

sparklyr write data to hdfs or hive

3 Answers3