I have in python a Spark DataFrame with nested columns, and I have the path a.b.c
, and want to check if there is a nested column after c called d, so if a.b.c.d
exists.
Simply checking df.columns['a']['b']['c']['d']
or df.columns['a.b.c.d']
doesn't seem to work, so I found that the df.schema
function can be used.
So I just iterate through e.g.:
y = df.schema['a'].dataType['b'].dataType['c'].dataType
and then should normally check if d is in y.
The way I did it is simply try y['d']
, and if it fails, then it doesn't exist.
But I don't think using try is the best way.
So I tried checking if 'd' in y
, but apparently this doesn't work, although retrieving the element y['d']
works if it exists.
The type of y is StructType(List(StructField(d,StringType,true),...other columns))
So I don't really know how to properly check if d is in y. Why can't I directly check if 'd' in y
when I can retrieve y['d']
? Can anyone help? I'm also new in python, but I can't find or think of another solution.