4

This is part of the schema of my mongodb collection:

|-- variables: struct (nullable = true)  
|    |-- actives: struct (nullable = true)  
|    |    |-- data: struct (nullable = true)  
|    |    |    |-- 0: struct (nullable = true)  
|    |    |    |    |--active: integer (nullable = true)  
|    |    |    |    |-- inactive: integer (nullable = true)

I've fetched the collection and stored it in a Spark dataframe and am now trying to extract the innermost values in the variables column.

df_temp = df1.select(df1.variables.actives.data)

This works perfectly fine and I am able to get the inner structure of the data struct.

+----------------------+  
|variables.actives.data|  
+----------------------+  
|  [[1,32,0.516165...|  
|  [[1,30,1.173139...|  
|  [[4,18,0.160088...|

However, as soon as I try to go in further:

df_temp = df1.select(df1.variables.actives.data.0.active)

I get an invalid syntax error.

df_temp = df1.select(df1.variables.actives.data.0.active)
^
SyntaxError: invalid syntax

The problem is with my inner field's key's name being a number and I couldn't find an example where the inner field key's name is a number.

What would be the best way to achieve my goal of retrieving the innermost values (active and inactive) from the dataframe?

bigbong
  • 541
  • 4
  • 12
  • 1
    Why not simply use `df1.select("variables.actives.data.0.active")`? – rohitkulky Jan 02 '18 at 15:37
  • Voted to reopen since this question refers to pyspark (in Python), not Spark (in Scala) – Danny Varod Jan 20 '21 at 11:19
  • Since question is closed, I'll add my answer as a comment: You can select columns from arrays using `element_at('array_col_name', one_based_index).alias('new_col_name)` and columns from structures using `col('struct_col_name.struct_field_name').alias('new_col_name')` and put all these new columns into a `select` e.g. `result_df = df.select(*array_of_new_cols)` – Danny Varod Jan 20 '21 at 11:22

1 Answers1

5

You can try:

df_temp = df1.select(df1.variables.actives.data["0"].active)
mtoto
  • 23,919
  • 4
  • 58
  • 71