2

How do I convert a SparkDataFrame from SparkR into a tbl_spark from sparklyr?

A similar question was asked here: Convert spark dataframe to sparklyR table "tbl_spark".

The suggestion was to use the sdf_copy_to function, however, the input of this function must be an R object in instead of a SparkDataFrame.

Any suggestions to solve this problem?

Thanks!

Paul
  • 8,734
  • 1
  • 26
  • 36

1 Answers1

1

Use a temp Spark table to convert from SparkR::SparkDataFrame to sparklyr::tbl_spark.

Starting with a SparkDataFrame in SparkR

df_sparkr <- SparkR::createDataFrame(data.frame(
  x = 1:10
))

Create a temp table in Spark

SparkR::registerTempTable(df_sparkr, "temp_df")

Read the table using sparklyr

sc <- sparklyr::spark_connect(master = "local")
df_sparklyr <- dplyr::tbl(sc, "temp_df")

Here is a second method if your data is small. You can convert to a normal R data frame then copy into sparklyr. This is not recommended if the data frame is large.

df_normal <- SparkR::collect(df_sparkr)
df_sparklyr <- dplyr::copy_to(sc, df_normal)
Paul
  • 8,734
  • 1
  • 26
  • 36
  • the registerTempTable (createOrReplaceTempView now) doesn't work because the spark context is different between SparkR and sparklyR. There's no good way to be able to do this. See [this answer](https://stackoverflow.com/a/54381583/2019736) that explains. – AgentBawls Oct 28 '21 at 12:46