1

I'm having some trouble uploading a GEOJson to my local elastic server.

curl -XPOST 'http://localhost:9200/geo/_doc' -d @earth-lands-1m.geo.json 

The file is a GeometryCollection with a lot of objects:

{"type":"GeometryCollection", "geometries": [ <polygon objects>] }

The file is 140M and this doesn't give any output, what am I doing wrong?

Valu3
  • 374
  • 3
  • 15

1 Answers1

0

Files so large are more or less guaranteed to fail. I'd recommend

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • It seems that splitting a GeometryCollection isn't possible with the suggested library in the first link – Valu3 Sep 21 '20 at 06:14
  • Can you be more specific? – Joe - GMapsBook.com Sep 21 '20 at 07:11
  • The suggested library - https://github.com/woodb/geojsplit - doesn't work with a GeometryCollection which is the type I mentioned in the question. According to the elastic documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-shape.html, this is just a collection of polygons. So I'm wondering if I can just split it to separate objects and upload each - but not 100% sure – Valu3 Sep 21 '20 at 12:10
  • Yes, splitting is the way to go. You need (multi)polygons only, you don't really care about the collection itself. Unless you have some top-level props you wanna preserve. In that case you just include them in each individual document at sync/ingest time. – Joe - GMapsBook.com Sep 21 '20 at 12:51
  • Nope, no top level props. Cool, so I can just take each member of the geometries array and bulk send the thing assuming each of them isn't too big. – Valu3 Sep 21 '20 at 13:02
  • @1 Yes. @2 although ES supports MultiPolygons too, I think they're internally represented as singular polygons. Reindexing & searching is going to be faster if the docs are smaller so splitting the large multipolygons is the way to go. – Joe - GMapsBook.com Sep 21 '20 at 13:24
  • What would be the downside of uploading three 50mb multipolygons? The purpose of the DB is to query if the multipolygons contain a given polygon – Valu3 Sep 21 '20 at 13:52
  • 50MB is fine since the limit is ~100MB. More at https://stackoverflow.com/a/62046953/8160318 – Joe - GMapsBook.com Sep 21 '20 at 13:57
  • @Valu3 how did you manage to upload large GeoJson to ElasticSearch? Did you index the whole GeometryCollection object? (if so, can you somehow bulk index it?) Or you split GeometryCollection into multiple Geometry objects and bulk indexed them? If so, did you find any limitations (data, performance, etc.)? Thank you! – Florin Vîrdol Jan 14 '21 at 12:22
  • @JoeSorocin What are the differences and pro / cons of ingesting ElasticSearch geoshape as multiple individual Geometry objects (polygons in individual documents) rather than one single GeometryCollection (polygons in a single document)? What about indexing (execution time and performance - even when using Bulk Index)? What about querying, searching, filtering (usability, performance, etc.)? Thank you! – Florin Vîrdol Jan 14 '21 at 12:24