Check if column exists in Spark when reading files in structured streaming

Asked Jul 28 '21 at 08:25

Active Jul 28 '21 at 08:25

Viewed 518 times

I'm reading files from HDFS using Spark Structured Streaming API. but the schema is not fixed. I'm using:

 sql.streaming.schemaInference: "true"

so the schema might be different for every batch. So if I try to select rows using:

 dataframe.select(columnName)

it will complain if the column doesn't exist. So is there a way to check if the column doesn't exist before selecting it when using structured streaming api?

asked Jul 28 '21 at 08:25

Mahmoud Hanafy

1,861
3
24
33

Does this answer your question? [How do I detect if a Spark DataFrame has a column](https://stackoverflow.com/questions/35904136/how-do-i-detect-if-a-spark-dataframe-has-a-column) – Gokulraj Aug 01 '21 at 17:39
the problem with streaming is that the schema change for every batch. I think the solution in this question will only work fine for the first batch only. – Mahmoud Hanafy Aug 02 '21 at 10:53

Check if column exists in Spark when reading files in structured streaming

0 Answers0