I have a dataframe that contains a Polyline column (from Magellan). I want to extract some fields of this column to new columns. Here is an example of what I want to do :
spark.read
.format("magellan")
.load(My_Path)
.withColumn("xcoordinates",$"polyline"("xcoordinates")) // Do not work
.drop("polyline")
But then I get the error :
Can't extract value from polyline#1190: need struct type but got polyline;
Here is a sample of data :
DF : (id, polyline, otherColumns)
ID1, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
ID2, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
ID3, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
And an example of expected output :
DF2 : (id, xcoordinates, otherColumns)
ID1, [55.37,55.376], ...
ID2, [55.37,55.376], ...
ID3, [55.37,55.376], ...
EDIT : I finally managed to do what I wanted :
import magellan.PolyLine
val xcoordinates = (data: PolyLine) => data.xcoordinates
val getXcoordinatesUDF = udf(xcoordinates)
spark.read
.format("magellan")
.load(My_Path)
.withColumn("xcoordinates",getXcoordinatesUDF($"polyline"))
.drop("polyline")