0

I have a data frame df with the schema that looks like -

root
 |-- users: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- ok: boolean (nullable = true)
 |    |    |-- attributes: struct (nullable = true)
 |    |    |    |-- array1: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |    |    |-- groupid: string (nullable = true)
 |    |    |    |-- array2: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |    |    |-- array3: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)
 |    |    |    |-- array4: array (nullable = true)
 |    |    |    |    |-- element: string (containsNull = true)

I want to access and analyze values of array1, array2, array3, array4. I am trying by:

df.users.attributes.array1

It gives me an error -

  AttributeError: 'Series' object has no attribute 'attributes'

How will I be able to access the values/data within these arrays - array1, array2, array3 and array4?

ComplexData
  • 1,091
  • 4
  • 19
  • 36
  • 2
    Data frame "schema"? I've never heard that term before. This sounds like you are trying to use a dataframe like a sql database... I don't know where your schema is coming from, but why don't you show us the result of `print(df.head(10))` and `df.dtypes` – juanpa.arrivillaga Jul 27 '17 at 20:46
  • 2
    Also, please consider some of the answers [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) to help improve the quality of your question. Your schema makes little sense. Generally, you do **not** use arrays as elements of your data-frame, nor structs... – juanpa.arrivillaga Jul 27 '17 at 20:48
  • Take @juanpa.arrivillaga's advice please – piRSquared Jul 27 '17 at 22:47

0 Answers0