0

I am having some difficulty mapping a function to rows of a dataframe and then convert this back to a new dataframe.

So far I have

  val intrdd = df.rdd.map(row => processRow(row))

  val processeddf = intrdd.toDF

However this does not work as toDF does not work for my RDD[Row] case.

Is there a good way to do this?

Note I am on Spark 2.2.0 so I cannot use SqlContext, only SparkSession.

Thanks.

user48944
  • 311
  • 1
  • 14
  • https://stackoverflow.com/questions/37011267/how-to-convert-an-rddrow-back-to-dataframe – Raphael Roth Jan 19 '18 at 18:44
  • Is there a way to do it this way with Spark 2.0.0+ and spark session? 2) You can use createDataFrame(rowRDD: RDD[Row], schema: StructType), which is available in the SQLContext object. Example: val df = oldDF.sqlContext.createDataFrame(rdd, oldDF.schema) Note that there is no need to explicitly set any schema column. We reuse the old DF's schema, which is of StructType class and can be easily extended. However, this approach sometimes is not possible, and in some cases can be less efficient than the first one. – user48944 Jan 19 '18 at 18:47
  • you can still use sqlContext in spark 2: `sparkSession.sqlContext` – Raphael Roth Jan 19 '18 at 18:48
  • val ss = SparkSession .builder() .appName("ImputationApp") .enableHiveSupport() .getOrCreate() val sqlContext = new org.apache.spark.sql.SQLContext(ss) Doesn't seem to work – user48944 Jan 19 '18 at 18:50
  • no, `ss.sqlContext` will give you the sqlContext – Raphael Roth Jan 19 '18 at 18:51
  • It Works. Thanks. – user48944 Jan 19 '18 at 19:01

0 Answers0