2

for a .geojson lambda file :

{ "type": "FeatureCollection",
    "features": [
      { "type": "Feature",
        "geometry": {
          "type": "Point",
          "coordinates": [102.0, 0.5]
          },
          "properties": {
            "label": "value0",
             ...
          }
        },
       { "type": "Feature",
         "geometry": {..

I would like to delete "Point" and "Polygon" and even "MultiPolygon" type of structures which have a key 'label' in properties.

   import json

   file = open('file.geojson', 'r')
   data = json.load(file)
   
   for k in data['features']:
       if 'label' in k['properties']:
           print('ok') #displays the right number of structure, so the loop works
           del k #doesn't work
           #data['features'].remove(k) works but delete only a part of elements in the file..
           print('okk') #displays so del doesn't work
   
   data_srt = open('file.geojson', 'w')
   data_srt.write(json.dumps(data, ensure_ascii=False, indent=2))
   data_srt.close()

These solutions don't work, why ? Thank you very much.

Tim
  • 513
  • 5
  • 20
  • They don't work because the syntax of the code is wrong due to unmatched quotes. – mkrieger1 Dec 22 '20 at 00:30
  • I edited, obviously my problem remains the same. – Tim Dec 22 '20 at 00:33
  • I guess I'm not understanding. What exactly is the problem? You said it doesn't work, would you please expand on that? What was wrong with `data['features'].remove(k)`? – codewelldev Dec 22 '20 at 00:42
  • With `del k` my file is exactly the same in output and with `data['features'].remove(k)` only a part of the structures that have a key `label` are deleted... I tried with a file that have only two structures that both have a key `label`, only the first one is delete (the first k in fact). – Tim Dec 22 '20 at 00:52
  • Yes `print('ok')` works. – Tim Dec 22 '20 at 00:55
  • I usually use `print()` statements to debug any code that I am working with, have you tried `print(data)` just to see what the dictionary looks like, to make sure that the format is consistent? – codewelldev Dec 22 '20 at 01:01
  • I am having a difficult time recreating this problem. For me, `data["features"].remove(k)` removes all structures with a `'label'` key. – codewelldev Dec 22 '20 at 01:09
  • Yes, I tried `print(data)`, look like good. I even tried with a standard .geojson file like [here](https://en.wikipedia.org/wiki/GeoJSON) by adding in each `properties` a `label` key, It doesn't work. – Tim Dec 22 '20 at 01:31

1 Answers1

2

Consider this example:

l = [1, 2, 3]
for i in l:
    l.remove(i)

print(l) # Prints '[2]'

During the first iteration i == l[0]. l.remove(i) would then be the same as l.remove(l[0]). During the second iteration, i == l[1]. At this point, though, l == [2, 3], because l[0] was removed. So, in the second iteration, l.remove(i) is the same as l.remove(l[1]). After this is executed, l == [2]. If the loop tries to continue to a third iteration, i == l[2]. However, now that l == [2], len(l) == 1 so l[2] is out of bounds. Because of this, the for loop stops now, even though l is not empty. This is essentially the same problem that you are experiencing.

To fix this:

i = len(l) - 1
while i >= 0:
    l.remove(l[i])
    i -= 1

Iterating over a list backwards like this avoids the out-of-bounds problem that was encountered before.

To apply this concept in your situation, this is the solution:

i = len(data["features"]) - 1
while i >= 0:
    if "label" in data["features"][i]["properties"]:
        data["features"].pop(i)
    i -= 1

I just came up with a new, better solution (it uses the reversed() function):

for k in reversed(data["features"]):
    if "label" in k["properties"]:
        data["features"].remove(k)

This uses the same concept of backwards-iterating, but the reversed() function takes care of that for you.

The reason why the del statement had no functionality for you is caused by a more complex concept. I'll will do my best to explain (here is another answer that sort of helps explain it: https://stackoverflow.com/a/14814847/13911868).

When iterating through a list, or any container, in a for loop, like this:

l = [1, 2, 3]
for i in l:
    del i

The i variable is a deep copy of an item in the l, not a reference. That being the case, del i would delete the copied item only, not deleting the original item from l.

On the other hand, in this example:

l = [1, 2, 3]
for i in range(len(l)):
    del l[i]

del l[i] will delete the original item in l because l[i] returns that original object, not a copy.

In this example, though, you will encounter the same out-of-bounds problem as before, so a working solution using the del statement would be:

for k in reversed(range(len(data["features"]))):
    if "label" in data["features"][k]["properties"]:
        del data["features"][k]
codewelldev
  • 246
  • 1
  • 6
  • Thanks a lot for your explanations ! And I still don't understand why `del` doesn't work... would you know? – Tim Dec 22 '20 at 12:57
  • @Tim I just added to my answer to include an explanation about the `del` statement in your situation. – codewelldev Dec 22 '20 at 22:14