My dataframe looks like this: The specific values for a respective entity are at the same index of the list in a consistent way overarching all shown columns.
column_1 | [2022-08-05 03:38...
column_2 | [inside, inside, ...
column_3 | [269344c6-c01c-45...
column_4 | [ff870660-57ce-11...
column_5 | [Mannheim, Mannhe...
column_6 | [26, 21, 2, 8]
column_7 | [fa8103a0-57ce-11...
column_8 | [ATG1, ATG3, Variable1...
My Approach:
#Get columns
df_colum_names = list(df.schema.names)
# Set condition with a expression
filter_func = ("filter(geofenceeventtype,spatial_wi_df -> df.column_8 == 'Variable1')")
geofence_expr= f"transform(sort_array({filter_func}), x -> x."
geofence_prefix = "geofence_sorted"
# extract to new columns
for col in df_colum_names:
df = df.withColumn(
geofence_prefix + col,
F.element_at(
F.expr(geofence_expr + col.replace("_", ".") + ")"), 1),)
In this way i want to create columns only with the specific values of entity 'Variable1' and then drop all rows without data from this entity.
The error message:
Can't extract value from lambda df#2345: need struct type but got string
So there are rows where the value of the column is just one value as a String and not a Structtype, how to deal with this problem?