Extract String from Spark DataFrame

Asked Aug 31 '17 at 20:41

Active Aug 31 '17 at 20:41

Viewed 35 times

I have a supposedly simple problem that I've been stuck on for a while. My DataFrame has this schema:

root
 |-- _id: string (nullable = true)
 |-- created: timestamp (nullable = true)
 |-- geometry: struct (nullable = true)
 |    |-- coordinates: string (nullable = false)
 |    |-- type: string (nullable = false)

How do I extract the 'coordinates' and 'type' strings? (I know they shouldn't be strings but I'm looking to change them after isolating)

I'm using Databricks 3.0, Spark 2.2. Thank you!

asked Aug 31 '17 at 20:41

Ashley O

1,130
3
21
34

is it about scala df.select($"coordinates", $"type") or python df.select(df['coordinates'], df['type']) ? – Aug 31 '17 at 20:45
oh wow. I was trying to do it in Python but I just moved the data over to a temp table and was able to use this code to extract the elements. I think I'm good? Thank you! – Ashley O Aug 31 '17 at 20:49
`df.select(df.geometry.coordinates)`. You can directly "flatten" both using `*`: `df.select(psf.col("geometry.*"))` where `psf` is an alias for pyspark functions: `import pyspark.sql.functions as psf`. It will expand the `StructType` into two separate columns – MaFF Aug 31 '17 at 21:52

Extract String from Spark DataFrame

0 Answers0