2

I am trying to add a new column in each row of DataFrame like this

  def addNamespace(iter: Iterator[Row]): Iterator[Row] = {
    iter.map (row => {
      println(row.getString(0))
//      Row.fromSeq(row.toSeq ++ Array[String]("shared"))

      val newseq = row.toSeq ++ Array[String]("shared")
      Row(newseq: _*)
    })
    iter
  }

  def transformDf(source: DataFrame)(implicit spark: SparkSession): DataFrame = {
    val newSchema = StructType(source.schema.fields ++ Array(StructField("namespace", StringType, nullable = true)))
    val df = spark.sqlContext.createDataFrame(source.rdd.mapPartitions(addNamespace), newSchema)
    df.show()
    df

  }

But I keep getting this error - Caused by: java.lang.RuntimeException: org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string on the line df.show()

Can somebody please help in figuring out this. I have searched around in multiple posts but whatever I have tried is giving me this error.

I have also tried val again = sourceDF.withColumn("namespace", functions.lit("shared")) but it has the same issue.

Schema of already read data

root
 |-- name: string (nullable = true)
 |-- data: struct (nullable = true)
 |    |-- name: string (nullable = true)
 |    |-- description: string (nullable = true)
 |    |-- activates_on: timestamp (nullable = true)
 |    |-- expires_on: timestamp (nullable = true)
 |    |-- created_by: string (nullable = true)
 |    |-- created_on: timestamp (nullable = true)
 |    |-- updated_by: string (nullable = true)
 |    |-- updated_on: timestamp (nullable = true)
 |    |-- properties: map (nullable = true)
 |    |    |-- key: string
 |    |    |-- value: string (valueContainsNull = true)
user3190018
  • 890
  • 13
  • 26
KingJames
  • 546
  • 2
  • 9
  • 20
  • There is something wrong with the actual data. Check for special characters, backslashes and spaces. Even if one of the record has this issue, you won't be able to create the dataframe. The error means - actual data is not in line with the sort of data you should provide for a String type. – a9207 Jun 28 '19 at 18:17
  • @Anamdeo I am reading data from Cassandra and want to add a new column of StringType which has a fixed string value. So even if I have some special characters in already read data, I won't be able to create the dataframe? – KingJames Jun 28 '19 at 18:27
  • Can you print the schema of already read data and paste it here. – a9207 Jun 28 '19 at 19:01
  • @Anamdeo updated the already read schema in the questions – KingJames Jun 28 '19 at 20:11
  • @Anamdeo is right see my answer and solution for this. – Ram Ghadiyaram Jun 29 '19 at 03:09
  • @RamGhadiyaram I don't understand your solution, I'm having the same error, can you post your solution instead of linking to other SOs – Aaron Stainback Jul 01 '20 at 18:21

1 Answers1

0

Caused by: java.lang.RuntimeException: org.apache.spark.unsafe.types.UTF8String is not a valid external type for schema of string

means its unable to understand as string type... for newly added "namespace" column.

Clearly indicates datatype mismatch error at catalyst level...

see spark code here..

override def eval(input: InternalRow): Any = {
    val result = child.eval(input)
    if (checkType(result)) {
      result
    } else {
      throw new RuntimeException(s"${result.getClass.getName}$errMsg")
    }
  }

and error message is s" is not a valid external type for schema of ${expected.catalogString}"

So UTF String is not real string you need to encode/decode it before passing it as string type otherwise catalyst will not able to understand what you are passing.

How to fix it ?

Below are the SO content which will address how to encode/decode to/from utfstring to string and viceversa... you may need to apply suitable solution for this.

https://stackoverflow.com/a/5943395/647053 string decode utf-8

Note : This online UTF-8 encoder/decoder tool is very handy to put sample data and convert that to string. try this first....

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • Hello. It sounds useful. Haven't got chance to try it. I will try it once I get some time and update. thanks! – KingJames Jul 02 '19 at 02:32