1

I have a dataframe that contains a Polyline column (from Magellan). I want to extract some fields of this column to new columns. Here is an example of what I want to do :

spark.read
      .format("magellan")
      .load(My_Path)
      .withColumn("xcoordinates",$"polyline"("xcoordinates")) // Do not work
      .drop("polyline")

But then I get the error :

Can't extract value from polyline#1190: need struct type but got polyline;

Here is a sample of data :

DF : (id, polyline, otherColumns)

ID1, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
ID2, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
ID3, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...

And an example of expected output :

DF2 : (id, xcoordinates, otherColumns)

ID1, [55.37,55.376], ...
ID2, [55.37,55.376], ...
ID3, [55.37,55.376], ...

EDIT : I finally managed to do what I wanted :

import magellan.PolyLine

val xcoordinates = (data: PolyLine) => data.xcoordinates
val getXcoordinatesUDF = udf(xcoordinates)

 spark.read
          .format("magellan")
          .load(My_Path)
          .withColumn("xcoordinates",getXcoordinatesUDF($"polyline"))
          .drop("polyline")
Nakeuh
  • 1,757
  • 3
  • 26
  • 65

0 Answers0