3

I am indexing some geojson file (around 4000 ~ 5000 multi-polygon features) into Elasticsearch.

Here is the mappings

"mappings": {
       "properties": {
      "type": {
        "type": "keyword"
      },
      "properties": {
        "type": "object"
      },
      "geometry": {
        "type": "geo_shape"
      }
       }
    }

My code for indexing looks like this:

helpers.bulk(es, k, chunk_size=500, request_timeout=1000)

The indexing action (in chunk) is stopped by this error message:

{'type': 'mapper_parsing_exception', 'reason': 'failed to parse field [geometry] of type [geo_shape]', 'caused_by': {'type': 'illegal_argument_exception', 'reason': 'Unable to Tessellate shape

What is the cause of this error?
Can I ignore this error when indexing geojson files?

Scorpioooooon21
  • 491
  • 5
  • 17

3 Answers3

4

I had a look into the issue and the polygon is valid and uncover a bug in Lucene tessellator. I opened an issue:

https://issues.apache.org/jira/browse/LUCENE-9417

And the fix is here:

https://github.com/apache/lucene-solr/pull/1614

Ignacio Vera
  • 101
  • 2
  • Thanks @Ignacio Vera for following up. How do I apply the fix to my current platform? – Scorpioooooon21 Jul 01 '20 at 00:57
  • The change has been committed in Lucene and it will be included in the next Lucene release (lucene branchn 8.6 was cut yesterday). I would expect the fix will be included in the next Elasticsearch release that depends on that Lucene release. – Ignacio Vera Jul 01 '20 at 06:48
  • Thanks @IgnacioVera I can see that fix is now on the ES 7.9 https://github.com/elastic/elasticsearch/blob/7.9/buildSrc/version.properties#L2 – Oshan Wisumperuma Aug 05 '20 at 07:18
  • Hi @IgnacioVera! I tried the Elastic Search 7.9.1 docker (docker.elastic.co/elasticsearch/elasticsearch:7.9.1) containing your fix. I still get the "unable to tessellate shape" errors for self-intersecting shapes, even ones without 'holes'. Is this the expected behaviour? Would it help if I open a separate issue? – Joel Sep 11 '20 at 08:16
  • If the polygons are invalid (e.g they has self-intersecting edges) then yes, it is expected. I think it would be better if we open a separate issue. – Ignacio Vera Sep 14 '20 at 07:21
  • Hi @IgnacioVera, I opened a separate issue regarding indexing self-intersecting polygons. Thanks for the reply. – Joel Oct 28 '20 at 09:40
1

Your geojson is syntactically correct & valid. Now you just need to make sure that you index your multi-polygons properly:

PUT demo_l08_bs
{
  "mappings": {
    "properties": {
      "geometry": {
        "type": "geo_shape"
      }
    }
  }
}

Index the geojson w/o changing anything:

POST demo_l08_bs/_doc
{
  "properties": {
    ...
  },
  "geometry": {
    "type": "MultiPolygon",
    "coordinates": [...]
  }
}

Verify a point lies within it:

GET demo_l08_bs/_search
{
  "query": {
    "geo_shape": {
      "geometry": {
        "shape": {
          "type": "point",
          "coordinates": [
            151.14646911621094,
            -33.68463933764522
          ]
        },
        "relation": "intersects"
      }
    }
  }
}

enter image description here

Val
  • 207,596
  • 13
  • 358
  • 360
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Yes, I can manually index this single doc, however, I want to know how to handle it via python elasticsearch helpers.bulk – Scorpioooooon21 Jun 08 '20 at 10:36
  • The problem with the shape is the fact that it is self-intersecting in many different places and ES doesn't allow that. – Val Jun 08 '20 at 10:47
  • thanks @val -- 7.2.0 ingested it w/o an issue. @Scorpioooooon21 there are already answered q's regarding `helpers.bulk` such as https://stackoverflow.com/a/61642184/8160318 What ES version are you using? – Joe - GMapsBook.com Jun 08 '20 at 10:50
  • Worth noting that on 7.6 and up I get the same issue as @Scorpioooooon21, i.e. `Unable to Tessellate shape` – Val Jun 08 '20 at 10:53
  • 1
    You can see this thread for more background info on a similar issue I encountered a while ago and self-intersection was the culprit: https://stackoverflow.com/questions/55587629/failed-to-create-valid-geo-shape/55588509#55588509 – Val Jun 08 '20 at 10:55
  • @Val, thanks for your comment. I use QGIS to valid all my polygons and none of them have self-intersection. Is there any recommended solution from you to quickly identify the polygons which are having self-intersection? – Scorpioooooon21 Jun 09 '20 at 00:48
  • @Val I try to tune the "chunk_size" in helpers.bulk to 300, 400, 500, and each time I have various numbers of document ingested. Could this "Unable to Tessellate shape" issue caused by inappropriate chunk_size? – Scorpioooooon21 Jun 09 '20 at 00:53
  • I took the liberty to edit @joe's image and you can see the little red circle at the top. If you zoom in on QGIS, you'll see that the points step on each other. That's what I mean by self intersection. QGIS (and the GeoJSON spec) are more permissible than ES. – Val Jun 09 '20 at 03:56
  • 1
    For verifying CCW correctness, you can check this: https://mapster.me/right-hand-rule-geojson-fixer. Also I was successful in fixing self-intersections using this: https://github.com/mclaeysb/simplepolygon – Val Jun 09 '20 at 07:33
1

I am not sure if this error was caused by some complicated multi-polygons in the input file.

However, after converted multi-ploygons into individual polygons inspired by the below post, I managed to ingest all shapes without any error :)

https://gist.github.com/mhweber/cf36bb4e09df9deee5eb54dc6be74d26

Scorpioooooon21
  • 491
  • 5
  • 17