I want to create dataframe on top of kafka topic and after that i want to register that dataframe as temp table to perform minus operation on data. I have written below code. But while querying registered table I'm getting error "org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;"
org.apache.spark.sql.types.DataType
org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types._
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "SERVER ******").option("subscribe", "TOPIC_NAME").option("startingOffsets", "earliest").load()
df.printSchema()
val personStringDF = df.selectExpr("CAST(value AS STRING)")
val user_schema =StructType(Array(StructField("OEM",StringType,true),StructField("IMEI",StringType,true),StructField("CUSTOMER_ID",StringType,true),StructField("REQUEST_SOURCE",StringType,true),StructField("REQUESTER",StringType,true),StructField("REQUEST_TIMESTAMP",StringType,true),StructField("REASON_CODE",StringType,true)))
val personDF = personStringDF.select(from_json(col("value"),user_schema).as("data")).select("data.*")
personDF.registerTempTable("final_df1")
spark.sql("select * from final_df1").show
ERROR:---------- "org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;"
Also i have used start() method and I'm getting below error.
20/08/11 00:59:30 ERROR streaming.MicroBatchExecution: Query final_df1 [id = 1a3e2ea4-2ec1-42f8-a5eb-8a12ce0fb3f5, runId = 7059f3d2-21ec-43c4-b55a-8c735272bf0f] terminated with error java.lang.AbstractMethodError
NOTE: My main objective behind writing this script is i want to write minus query on this data and want to compare it with one of the register table i have on cluster. So , to summarise If I'm sending 1000 records in kafka topic from oracle database, I'm creating dataframe on top of oracle table , registering it as temp table and same I'm doing with kafka topic. Than i want to run minus query between source(oracle) and target(kafka topic). to perform 100% data validation between source and target. (Registering kafka topic as temporary table is possible?)