0

How can I convert a pair RDD of the following type

joinResult
res16: org.apache.spark.api.java.JavaPairRDD[com.vividsolutions.jts.geom.Polygon,java.util.HashSet[com.vividsolutions.jts.geom.Polygon]] = org.apache.spark.api.java.JavaPairRDD@264b550

to a data frame? https://github.com/geoHeil/geoSparkScalaSample/blob/master/src/main/scala/myOrg/GeoSpark.scala#L72-L75

joinResult.toDF().show 

will not work as well as

Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
  • Have you seen if this works for you? http://stackoverflow.com/questions/42405905/how-to-convert-a-javapairrdd-to-dataset – Shweta Gupta Mar 09 '17 at 08:24
  • Is I possible without a collect? – Georg Heiler Mar 09 '17 at 08:26
  • you could use `pairRDD.rdd` instead – Alex Karpov Mar 09 '17 at 09:04
  • 1
    in your case `joinResult.rdd.toDF.show` should work – Alex Karpov Mar 09 '17 at 09:06
  • This is a step into the right direction, but will fail due to missing encoder. – Georg Heiler Mar 09 '17 at 11:46
  • Could you provide an encoder? What the exact error message about missing encoder? – Alex Karpov Mar 09 '17 at 14:21
  • Unfortunately not. The problem is outlined very well http://stackoverflow.com/questions/36648128/how-to-store-custom-objects-in-a-datase as shown https://github.com/geoHeil/geoSparkScalaSample/blob/master/src/main/scala/myOrg/GeoSpark.scala#L76-L77 will simply put all into a single binary column which destroys the possibility for joins. So far I did not get more complex stuff to work. Looking forward to contributions – Georg Heiler Mar 09 '17 at 15:01

0 Answers0