0

I have a supposedly simple problem that I've been stuck on for a while. My DataFrame has this schema:

root
 |-- _id: string (nullable = true)
 |-- created: timestamp (nullable = true)
 |-- geometry: struct (nullable = true)
 |    |-- coordinates: string (nullable = false)
 |    |-- type: string (nullable = false)

How do I extract the 'coordinates' and 'type' strings? (I know they shouldn't be strings but I'm looking to change them after isolating)

I'm using Databricks 3.0, Spark 2.2. Thank you!

Ashley O
  • 1,130
  • 3
  • 21
  • 34
  • is it about scala df.select($"coordinates", $"type") or python df.select(df['coordinates'], df['type']) ? –  Aug 31 '17 at 20:45
  • oh wow. I was trying to do it in Python but I just moved the data over to a temp table and was able to use this code to extract the elements. I think I'm good? Thank you! – Ashley O Aug 31 '17 at 20:49
  • `df.select(df.geometry.coordinates)`. You can directly "flatten" both using `*`: `df.select(psf.col("geometry.*"))` where `psf` is an alias for pyspark functions: `import pyspark.sql.functions as psf`. It will expand the `StructType` into two separate columns – MaFF Aug 31 '17 at 21:52

0 Answers0