6

I tried using sparklyr to write data to hdfs or hive , but was unable to find a way . Is it even possible to write a R dataframe to hdfs or hive using sparklyr ? Please note , my R and hadoop are running on two different servers , thus I need a way to write to a remote hdfs from R .

Regards Rahul

Rahul
  • 71
  • 1
  • 4
  • Have you tried to run Spark in yarn mode? [This](https://stackoverflow.com/questions/38102921/can-sparklyr-be-used-with-spark-deployed-on-yarn-managed-hadoop-cluster) post might be helpful. – michalrudko Jun 27 '17 at 22:41

3 Answers3

6

Writing Spark table to hive using Sparklyr:

iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
sdf_copy_to(sc, iris_spark_table)
DBI::dbGetQuery(sc, "create table iris_hive as SELECT * FROM iris_spark_table")
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Jeereddy
  • 1,050
  • 8
  • 16
  • 1
    thanks for sharing. this loads the data into hive's default database. do you know how to change the hive database for which to export the file? – bshelt141 Feb 26 '18 at 21:37
  • @bshelt141 You can use syntax `database.table` in the SQL passed to `DBI`. – Konrad Jul 26 '19 at 15:02
5

As of latest sparklyr you can use spark_write_table. pass in the format database.table_name to specify a database

iris_spark_table <- copy_to(sc, iris, overwrite = TRUE)
spark_write_table(
  iris_spark_table, 
  name = 'my_database.iris_hive ', 
  mode = 'overwrite'
)

Also see this SO post here where i got some input on more options

blakiseskream
  • 338
  • 4
  • 9
0

You can use sdf_copy_to to copy a dataframe into Spark, lets say tempTable. Then use DBI::dbGetQuery(sc, "INSERT INTO TABLE MyHiveTable SELECT * FROM tempTable") to insert the dataframe records in a hive table.

edeg
  • 76
  • 2