1

In a GeoJSON file, some properties are shared by all "features" (element) of the entire collection (array). But some properties are defined only for a subset of the collection. I've found this question: [javascript] counting properties of the objects in an array of objects, but it doesn't answer my problem.

Example:

const features =
[ {"properties":{"name":"city1","zip":1234}, "geometry":{"type":"polygon","coordinates":[[1,2],[3,4] ...]}},
  {"properties":{"name":"city2","zip":1234}, "geometry":{"type":"polygon","coordinates":[[1,2],[3,4] ...]}},
  {"properties":{"name":"city3"},"geometry":{"type":"multiPolygon","coordinates":[[[1,2],[3,4] ...]]}},
// ... for instance 1000 different cities
  {"properties":{"name":"city1000","zip":1234,"updated":"May-2018"}, "geometry":{"type":"polygon","coordinates":[...]}}
];

expected result: a list a all existing properties and their cardinality, letting us know how (in)complete is the data-set. For instance:

properties: 1000, properties.name: 1000, properties.zip: 890, properties.updated: 412,
geometry: 1000, geometry.type: 1000, geometry.coordinates: 1000

I have a (rather complicated) solution, but I do suspect that some people have already faced the same issue (seems a data science classic), with a better one (performance matters).

Here is my clumsy solution:

// 1: list all properties encountered in the features array, at least two levels deep
const countProps = af => af.reduce((pf,f) =>
                                            Array.from(new Set(pf.concat(Object.keys(f)))), []);
// adding all the properties of each individual feature, then removing duplicates using the array-set-array trick
const countProp2s = af => af.reduce((pf,f) =>
                                            Array.from(new Set(pf.concat(Object.keys(f.properties)))), []);
const countProp2g = af => af.reduce((pf,f) =>
                                            Array.from(new Set(pf.concat(Object.keys(f.geometry)))), []);

// 2: counting the number of defined occurrences of each property of the list 1
const countPerProp =  (ff) => pf => ` ${pf}:${ff.reduce((p,f)=> p+(!!f[pf]), 0)}`;
const countPerProp2s = (ff) => pf => ` ${pf}:${ff.reduce((p,f)=> p+(!!f.properties[pf]), 0)}`;
const countPerProp2g = (ff) => pf => ` ${pf}:${ff.reduce((p,f)=> p+(!!f.geometry[pf]), 0)}`;
const cardinalities = countProps(features).map((kk,i) => countPerProp(ff)(kk)) +
                      countProp2s(features).map(kk => countPerProp2s(ff)(kk)) +
                      countProp2g(features).map(kk => countPerProp2g(ff)(kk));

Therefore, there are three issues:

-step 1: this is much work (adding everything before removing most of it) for a rather simple operation. Moreover, this isn't recursive and second level is "manually forced".

-step 2, a recursive solution is probably a better one.

-May step 1 and 2 be performed in a single step (starting to count when a new property is added)?

I would welcome any idea.

Community
  • 1
  • 1
allez l'OM
  • 547
  • 4
  • 13
  • Getting the list of properties recursively is solved in https://stackoverflow.com/q/15690706/215552 – Heretic Monkey Nov 30 '19 at 00:02
  • Specifically for GeoJSON, you don't need to inspect `geometry` at all, because it exists on all features. So, recursion is not needed. – georg Nov 30 '19 at 00:16
  • @georg: you're perfectly right, but I had to deal with some open source data where people didn't inform all fields, in particular "tricking" the geometry with a "geometry":null value. Therefore geometry.type doesn't exist, but the file is accepted as a geoJSON file. (That's not me, I swear it). And the next 2 answers are perfect, and we can choose either, depending on situation. – allez l'OM Dec 01 '19 at 11:16

3 Answers3

2

The JSON.parse reviver and JSON.stringify replacer can be used to check all key value pairs :

var counts = {}, json = `[{"properties":{"name":"city1","zip":1234}, "geometry":{"type":"polygon","coordinates":[[1,2],[3,4]]}},{"properties":{"name":"city2","zip":1234}, "geometry":{"type":"polygon","coordinates":[[1,2],[3,4]]}},{"properties":{"name":"city3"},"geometry":{"type":"multiPolygon","coordinates":[[[1,2],[3,4]]]}},{"properties":{"name":"city1000","zip":1234,"updated":"May-2018"}, "geometry":{"type":"polygon","coordinates":[]}} ]`

var features = JSON.parse(json, (k, v) => (isNaN(k) && (counts[k] = counts[k] + 1 || 1), v))

console.log( counts, features )
Slai
  • 22,144
  • 5
  • 45
  • 53
  • Slai: I like your "single line" solution, and up-vote for it. Moreover, It would work for properties of any depth level. I will also vote for @George Jempty solution because it fits my specific 2-levels question, and runs a bit faster with Firefox 70. See: https://jsperf.com/counting-properties-in-array-of-objects/1 – allez l'OM Dec 01 '19 at 11:01
  • @allezl'OM then I recommend trying the `JSON.parse` alternative too, as parsing JSON is faster than parsing JS – Slai Dec 01 '19 at 11:25
  • Slai: I did update the jsperf test, you're right. But in my case, I have features objects, mostly after intermediate processing, where I need to control modified properties, then _JSON.parse(JSON.stringify(features), f)_ is not efficient. – allez l'OM Dec 02 '19 at 11:23
  • NB: perf comparisons are not consistent between different browsers (Firefox, Edge, Chrome). It's really hard to draw any sustainable conclusion. – allez l'OM Dec 02 '19 at 11:48
1

Consider trying the following. It is just one reduce, with a couple of nested forEach's inside. It checks whether the keys for indicating the count exist in the object to be returned, and if not creates them initialized to 0. Then whether those keys existed or not to begin with, their corresponding values get incremented by 1.

Repl is here: https://repl.it/@dexygen/countobjpropoccur2levels , code below:

const features =
[ {"properties":{"name":"city1","zip":1234}, "geometry":{"type":"polygon","coordinates":[[1,2],[3,4]]}},
  {"properties":{"name":"city2","zip":1234}, "geometry":{"type":"polygon","coordinates":[[1,2],[3,4]]}},
  {"properties":{"name":"city3"},"geometry":{"type":"multiPolygon","coordinates":[[[1,2],[3,4]]]}},
  {"properties":{"name":"city1000","zip":1234,"updated":"May-2018"}, "geometry":{"type":"polygon","coordinates":[]}}
];

const featuresCount = features.reduce((count, feature) => {
  Object.keys(feature).forEach(key => {
    count[key] = count[key] || 0;
    count[key] += 1;
    Object.keys(feature[key]).forEach(key2 => {
      let count2key = `${key}.${key2}`;
      count[count2key] = count[count2key] || 0;
      count[count2key] += 1;
    });
  });
  return count;
}, {});

console.log(featuresCount);

/*
{ properties: 4,
  'properties.name': 4,
  'properties.zip': 3,
  geometry: 4,
  'geometry.type': 4,
  'geometry.coordinates': 4,
  'properties.updated': 1 }
*/
Dexygen
  • 12,287
  • 13
  • 80
  • 147
  • 1
    George: your solution perfectly fits my GeoJSON (2-levels) question, and runs a bit faster than Slai's, with Firefox 70. See: jsperf.com/counting-properties-in-array-of-objects/1 . I will check perf with my full-length array of 35000 cities. – allez l'OM Dec 01 '19 at 11:04
  • @allezl'OM I tried for a while to convert my code into a function that would take a "level" argument but had other things to do LOL. But I'm wondering if one thing might help your performance, maybe substituting `count[key] = count[key] || 0;` with `if (count[key] === undefined) count[key] = 0` – Dexygen Dec 01 '19 at 18:31
  • 1
    George: you're right, the **if** version is even faster (Firefox 70): this is probably because new keys (undefined) are a minority, what is true in huge real cases. See: [jsperf.com/counting-properties-in-array-of-objects/1](https://jsperf.com/counting-properties-in-array-of-objects/1) – allez l'OM Dec 02 '19 at 11:14
  • 1
    NB: not true with Edge, _.v1_ is better than _.v2_. No reliable conclusion. – allez l'OM Dec 02 '19 at 11:50
-1

Use polymorphic serialization of json using jackson. It will look something like below. Your base interface will have all common properties and for each variation create sub types. Count on each type will give what you need

@JsonTypeInfo(use=JsonTypeInfo.Id.NAME, include=JsonTypeInfo.As.PROPERTY, property="name") @JsonSubTypes({ @JsonSubTypes.Type(value=Lion.class, name="lion"), @JsonSubTypes.Type(value=Tiger.class, name="tiger"), }) public interface Animal { }

JUser
  • 137
  • 2
  • 9