I am trying to add a new column in each row of DataFrame like this
def addNamespace(iter: Iterator[Row]): Iterator[Row] = {
iter.map (row => {
println(row.getString(0))
// Row.fromSeq(row.toSeq ++ Array[String]("shared"))
val newseq = row.toSeq ++ Array[String]("shared")
Row(newseq: _*)
})
iter
}
def transformDf(source: DataFrame)(implicit spark: SparkSession): DataFrame = {
val newSchema = StructType(source.schema.fields ++ Array(StructField("namespace", StringType, nullable = true)))
val df = spark.sqlContext.createDataFrame(source.rdd.mapPartitions(addNamespace), newSchema)
df.show()
df
}
But I keep getting this error - Caused by: java.lang.RuntimeException: org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string
on the line df.show()
Can somebody please help in figuring out this. I have searched around in multiple posts but whatever I have tried is giving me this error.
I have also tried val again = sourceDF.withColumn("namespace", functions.lit("shared"))
but it has the same issue.
Schema of already read data
root
|-- name: string (nullable = true)
|-- data: struct (nullable = true)
| |-- name: string (nullable = true)
| |-- description: string (nullable = true)
| |-- activates_on: timestamp (nullable = true)
| |-- expires_on: timestamp (nullable = true)
| |-- created_by: string (nullable = true)
| |-- created_on: timestamp (nullable = true)
| |-- updated_by: string (nullable = true)
| |-- updated_on: timestamp (nullable = true)
| |-- properties: map (nullable = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)