Spark Scala, how to check if nested column is present in dataframe

Question

I'm reading a dataframe from parquet file, which has nested columns (struct). How can I check if nested columns are present?

It might be like this

+----------------------+
| column1              |
+----------------------+
|{a_id:[1], b_id:[1,2]}|
+----------------------+

or like this

+---------------------+
| column1             |
+---------------------+
|{a_id:[3,5]}         |
+---------------------+

I know, how to check if top-level column is present, as answered here: How do I detect if a Spark DataFrame has a column :

df.schema.fieldNames.contains("column_name")

But how can I check for nested column?

You can use `.printSchema()` to analyze the inferred schema. Also you can convert to a typed `Dataset` by defining `case class myClass(...)` and using `.as[myClass]` to see if it converts successfully. — Travis Hegner, Mar 14 '19 at 13:35
[this answer](https://stackoverflow.com/a/36332079) explains it. This is most reliable method to check for nested columns. — shanmuga, Mar 14 '19 at 14:07
Possible duplicate of [How do I detect if a Spark DataFrame has a column](https://stackoverflow.com/questions/35904136/how-do-i-detect-if-a-spark-dataframe-has-a-column) — 10465355, Mar 14 '19 at 14:33

score 7 · Accepted Answer · answered Mar 14 '19 at 13:43

You can get schema of nested field as struct, and then check if your field is present in field names of it:

val index = df.schema.fieldIndex("column1")
val is_b_id_present = df.schema(index).dataType.asInstanceOf[StructType]
                          .fieldNames.contains("b_id")

Spark Scala, how to check if nested column is present in dataframe

1 Answers1