Check if an element is present in a StructType in a Spark DataFrame

Question

I have in python a Spark DataFrame with nested columns, and I have the path a.b.c, and want to check if there is a nested column after c called d, so if a.b.c.d exists.

Simply checking df.columns['a']['b']['c']['d'] or df.columns['a.b.c.d'] doesn't seem to work, so I found that the df.schema function can be used. So I just iterate through e.g.:

y = df.schema['a'].dataType['b'].dataType['c'].dataType

and then should normally check if d is in y.

The way I did it is simply try y['d'], and if it fails, then it doesn't exist. But I don't think using try is the best way.

So I tried checking if 'd' in y, but apparently this doesn't work, although retrieving the element y['d'] works if it exists.

The type of y is StructType(List(StructField(d,StringType,true),...other columns))

So I don't really know how to properly check if d is in y. Why can't I directly check if 'd' in y when I can retrieve y['d']? Can anyone help? I'm also new in python, but I can't find or think of another solution.

I think using `in` doesn't work because of the data type of `schema` which is a `StructType`, which according to documentation contains a list of `StructField`. So you are trying to check if string 'd' is in a list of `StructField`. — LiMuBei, Nov 15 '16 at 09:51
Possible duplicate of [how do I detect if a spark dataframe has a column](http://stackoverflow.com/questions/35904136/how-do-i-detect-if-a-spark-dataframe-has-a-column) — , Nov 15 '16 at 09:51
Yes, I though of that, but I still don't understand how come then retrieving `y['d']` works. So is there no simple way to check for this, except with `try`? The referenced post is not very helpful, because in python there is no quick `Try` as an option function (as far as I know, and that is the whole thing I am trying to avoid), and there's no solution for nested columns. — stackoverflowthebest, Nov 15 '16 at 11:04

score 0 · Answer 1 · answered Jul 26 '23 at 11:53

0

df.schema.simpleString().find("column_name:")

or

"column_name:" in df.schema.simpleString()

answered Jul 26 '23 at 11:53

Abdul Mannan

1,072
12
19

2

Answer needs supporting information Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](https://stackoverflow.com/help/how-to-answer). – moken Jul 27 '23 at 12:27

Check if an element is present in a StructType in a Spark DataFrame

1 Answers1