1

I'm using Apache Spark v3.0.1 and Apache Sedona v1.1.1 and I'm trying to read a Shapefile into a SpatialRDD. I first tried the example provided by the Sedona library (more specifically, the code inside testShapefileConstructor method), and it just worked. However, when I try to read another Shapefile, despite the fact that metadata was loaded correctly, the actual data was missing. Using count on the SpatialRDD gives me 0.

The shapefile I'm using is available here. It's the map of a Brazilian state. Since I tried with data from other states, I guess there's something wrong with those files.

And this is the code I used. I'm aware that the contents of the shapefile reside in a folder with .shp, .shx, .dbf and .prj files, so the variable path to that folder.

import org.apache.sedona.viz.core.Serde.SedonaVizKryoRegistrator
import org.apache.sedona.core.formatMapper.shapefileParser.ShapefileReader
import org.apache.sedona.sql.utils.{Adapter, SedonaSQLRegistrator}
import org.apache.sedona.viz.sql.utils.SedonaVizRegistrator
import org.apache.spark.serializer.KryoSerializer
import org.apache.spark.sql.{SparkSession, DataFrame, Encoder}

object Main {

def main(args: Array[String]) {
    val spark = SparkSession.builder
      .config("spark.master", "local[*]")
      .config("spark.serializer", classOf[KryoSerializer].getName)
      .config("spark.kryo.registrator", classOf[SedonaVizKryoRegistrator].getName)
      .appName("test")
      .getOrCreate()

    SedonaSQLRegistrator.registerAll(spark)
    SedonaVizRegistrator.registerAll(spark)

    val path = "/path/to/shapefile/folder"
    val spatialRDD = ShapefileReader.readToGeometryRDD(spark.sparkContext, path)
    println(spatialRDD.fieldNames)
    println(spatialRDD.rawSpatialRDD.count())
    var rawSpatialDf = Adapter.toDf(spatialRDD, spark)
    rawSpatialDf.show()
    rawSpatialDf.printSchema()
  }
}

Output:

[ID, CD_GEOCODM, NM_MUNICIP]
0
+--------+---+----------+----------+
|geometry| ID|CD_GEOCODM|NM_MUNICIP|
+--------+---+----------+----------+
+--------+---+----------+----------+
root
 |-- geometry: geometry (nullable = true)
 |-- ID: string (nullable = true)
 |-- CD_GEOCODM: string (nullable = true)
 |-- NM_MUNICIP: string (nullable = true)

I tried changing the character encoding, as pointed out here, but the results were the same after these attempts:

System.setProperty("sedona.global.charset", "utf8")

and

System.setProperty("sedona.global.charset", "iso-8859-1")

So I still have no idea why this fails to be read. What could be problem?

PiFace
  • 526
  • 3
  • 19

2 Answers2

1

Currently Sedona only supports Shapefile type Point, Polyline, Polygon, and MultiPoint (i.e., type 1, 3, 5, 8) according to https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/formatMapper/shapefileParser/parseUtils/shp/ShapeType.java

But your data might be something else because Shapefile specification supports more types: https://en.wikipedia.org/wiki/Shapefile

0

I had the same problem using the Wegsegment shapefile from https://www.geopunt.be/download?container=wegenregister&title=Wegenregister (which is the Flemish road register). I could open the file just fine with QGIS, I exported it from there with GeometryType LineString instead of Automatic and the export worked fine in Sedona. I noticed the original had LineStringM features (if you just add the layer and then hover over it). When I examined the m ordinate (cf. https://gis.stackexchange.com/a/274117/211228) it turned out to be empty, so don't think I lost anything there. Seems yours is a Polygon, but exporting with type Polygon instead of Automatic also makes it possible to read with Sedona.

Matthias
  • 808
  • 6
  • 10