0

I'm reading files from HDFS using Spark Structured Streaming API. but the schema is not fixed. I'm using:

 sql.streaming.schemaInference: "true"

so the schema might be different for every batch. So if I try to select rows using:

 dataframe.select(columnName)

it will complain if the column doesn't exist. So is there a way to check if the column doesn't exist before selecting it when using structured streaming api?

Mahmoud Hanafy
  • 1,861
  • 3
  • 24
  • 33
  • Does this answer your question? [How do I detect if a Spark DataFrame has a column](https://stackoverflow.com/questions/35904136/how-do-i-detect-if-a-spark-dataframe-has-a-column) – Gokulraj Aug 01 '21 at 17:39
  • the problem with streaming is that the schema change for every batch. I think the solution in this question will only work fine for the first batch only. – Mahmoud Hanafy Aug 02 '21 at 10:53

0 Answers0