0

My dataframe looks like this: The specific values for a respective entity are at the same index of the list in a consistent way overarching all shown columns.

column_1                                       | [2022-08-05 03:38...
column_2                                       | [inside, inside, ...
column_3                                       | [269344c6-c01c-45...
column_4                                       | [ff870660-57ce-11...
column_5                                       | [Mannheim, Mannhe...
column_6                                       | [26, 21, 2, 8]      
column_7                                       | [fa8103a0-57ce-11...
column_8                                       | [ATG1, ATG3, Variable1...

My Approach:

#Get columns
df_colum_names = list(df.schema.names)

# Set condition with a expression
filter_func = ("filter(geofenceeventtype,spatial_wi_df -> df.column_8 == 'Variable1')")
geofence_expr= f"transform(sort_array({filter_func}), x -> x."
geofence_prefix = "geofence_sorted"

# extract to new columns
for col in df_colum_names:
        df = df.withColumn(
        geofence_prefix + col,
        F.element_at(
        F.expr(geofence_expr + col.replace("_", ".") + ")"), 1),)

In this way i want to create columns only with the specific values of entity 'Variable1' and then drop all rows without data from this entity.

The error message:

Can't extract value from lambda df#2345: need struct type but got string

So there are rows where the value of the column is just one value as a String and not a Structtype, how to deal with this problem?

pi_janes
  • 63
  • 5

0 Answers0