2

I'm trying to load an external table as avro format, using HiveContext of pyspark. The external-table creation query runs in hive. However, the same query fails in hive context with error as, org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining schema. Returning signal schema to indicate problem: null

My avro schema is as follows.

{
  "type" : "record",
  "name" : "test_table",
  "namespace" : "com.ent.dl.enh.test_table",
  "fields" : [ {
    "name" : "column1",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column2",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column3",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column4",
    "type" : [ "null", "string" ] , "default": null
  } ]
}

My create table script is,

CREATE EXTERNAL TABLE test_table_enh ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://Staging/test_table/enh' TBLPROPERTIES ('avro.schema.url'='s3://Staging/test_table/test_table.avsc')

I'm running below code using using spark-submit,

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext

print "Start of program"
sc = SparkContext()
hive_context = HiveContext(sc)


hive_context.sql("CREATE EXTERNAL TABLE test_table_enh ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://Staging/test_table/enh' TBLPROPERTIES ('avro.schema.url'='s3://Staging/test_table/test_table.avsc')")

print "end"

Spark Version: 2.2.0 OpenJDK version: 1.8.0 Hive Version: 2.3.0

Gdek
  • 81
  • 4
  • 11

0 Answers0