1

I have a KML that looks like this file (only the "Document" portion is repeated another 2,000 times or so with slightly different coordinates in each entry).

<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.0">
    <Document>
        <Placemark>
            <Polygon>
                <extrude>1</extrude>
                <tesselate>1</tesselate>
                <altitudeMode>relativeToGround</altitudeMode>
                <outerBoundaryIs>
                    <LinearRing>
                        <coordinates>
                            -89.634425,40.73053,16  
                            -89.633951,40.73053,16  
                            -89.633951,40.73013,16  
                            -89.634425,40.73013,16
                        </coordinates>
                    </LinearRing>
                </outerBoundaryIs>
            </Polygon>
            <Style>
                <PolyStyle>
                    <color>#5a14F000</color>
                    <outline>1</outline>
                </PolyStyle>
            </Style>
        </Placemark>
    <Document> 
</kml>

The file was exported out of Google Earth. I'm trying to upload into a mapping tool (like CartoDB or Mapbox), however the file is rejected as having errors. I've ran the file through a KML validator like this one: KMLValidator. The changes I've determined to get it to upload are:

1) Replace line 2 with:

<kml xmlns="http://www.opengis.net/kml/2.2"
 xmlns:gx="http://www.google.com/kml/ext/2.2">

2) "Close the coordinates" This means that the co-ordinates currently listed are essentially a square (4 corners) to satisfy the validator I have to close the polygon by repeating the first set of coordinate. So the target would be:

                <coordinates>
                    -89.634425,40.73053,16  
                    -89.633951,40.73053,16  
                    -89.633951,40.73013,16  
                    -89.634425,40.73013,16
                    -89.634425,40.73053,16  
                </coordinates>

However, my problem is that I'm having trouble with updating the co-ordinates in an efficient way. So far this is the best I could come up with (with help from this post:

from pykml import parser from os import path

from lxml import etree

kml_file = path.join( \
     'C:\Development', \
      'sample.kml')

# Source: https://stackoverflow.com/questions/13712132/extract-coordinates-from-kml-batchgeo-file-with-python
root = parser.fromstring(open(kml_file, 'r').read())

coordinates_before = root.Document.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates

# Print the coordinates  (only prints the first branch )
print 'coordinates before'
print coordinates_before

# Set the coordinates to a new value - Attempting to update to new values 
# Get Errors from this
root.Document.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates
= coordinates_before+"1,1,1"

coordinates_after = root.Document.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates

print 'coordinates after'
print coordinates_after


# # Create and store a string representation of full KML tree
root_string = etree.tostring(root, pretty_print=True)

# # # Print the string representation using pretty_print
print root_string

As you can see I can manage to add an additional set of values (1,1,1), but

a) I'm not using the values from the first coordinate (rather just dummy values)

b) it's only updating the first branch (how can I scale it to repeat another 2,000 times?

c) also when I've updated the output file shows this text

 "coordinates xmlns:py="http://codespeak.net/lxml/objectify/pytype" py:pytype="str">"

Apologies if this is an overly in depth question, I've just been struggling with this for too long and seems like there should be an easy way that I'm missing. Thanks in advance for any help.

CodeMonkey
  • 22,825
  • 4
  • 35
  • 75
  • Hold on, I actually did almost this exact same thing in Python before, let me grab you the link – Jaron Thatcher Jun 14 '16 at 19:02
  • Ok I'm not sure how helpful this will be, but it might be good to look at https://github.com/thatchej/kml-parse/blob/master/parse_kml.py – Jaron Thatcher Jun 14 '16 at 19:03
  • Thanks Jaron, That did help me some. I figured out who to iterate over the tree using your regex function and I can grab and update the co-ordinates, but I can't figure out how to store the value in the tree, with a bit more struggling I think I'll get it. – Justin R. Locke Jun 15 '16 at 04:59
  • Yeah you can see my use case was slightly different in that I needed to create a ton of different files, so I could rewrite them appropriately. Rewriting the files in place is a pretty hard problem with Python. A couple (maybe hacky) solutions could be rewriting the whole file exactly as you want, or somehow using `subprocess.popen()` to make the changes with `awk`. These are likely very mediocre solutions, so take them with a grain of salt :) – Jaron Thatcher Jun 15 '16 at 19:16
  • Also re: using regex to parse HTML/XML/KML: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Jaron Thatcher Jun 15 '16 at 19:17

1 Answers1

0

Answered my own question. Thanks to Jaron for getting me unstuck (although I ended up not using regex but rather Element Tree). Once I got more familiar with navigating through a nested tree I managed to figure it out. Also helped to get more familiar with .findall() and using it in a for clause. Thanks!

def close_the_polygons(kml_file,color_name):
    # # Get the tree from KML File, add get color_name (only used when writing the file name )

    tree  = ET.parse(kml_file)
    root = tree.getroot()

    # iterate through tree to get to coordinate level

    for Document in root:
        for Placemark in Document.findall('Placemark'):
            for Polygon in Placemark.findall('Polygon'):
                for outerBoundaryIs in Polygon.findall('outerBoundaryIs'):
                    for LinearRing in outerBoundaryIs: # don't use Findall here because only 1 subelement
                        for coordinates in LinearRing: # don't use Findall here because only 1 subelement

                            ### convert the co-ordinate to text and delimiters become ordinary text (i.e. repr)
                            coordinates_text_before  =  repr(coordinates.text)

                            # ## Split the text (identifying the delimters)
                            coordinates_split_before = coordinates_text_before.split("\\t")

                            # # Store each entry of the array
                            entry_1 = coordinates_split_before[1]
                            entry_2 = coordinates_split_before[2][2:]
                            entry_3 = coordinates_split_before[3][2:]
                            entry_4 = coordinates_split_before[4][2:-10]
                            entry_5 = entry_1    #this solves the underlying problem of closing the polygon  

                            # # # consolidate into a single item array, goal with delimiters is to get it back to original text
                            string_updated_coordinates = "'\\n\\t"+entry_1+"\\t\\n"+entry_2+"\\t\\n"+entry_3+"\\t\\n"+entry_4+"\\t\\n"+entry_5+"\\n'"
                            updated_coordinates = literal_eval(string_updated_coordinates)

                            # # Store Updated Coordinates into original coordinates location
                            coordinates.text = updated_coordinates

    # # Write back to a file once all updates are complete
    tree = ET.ElementTree(root)
    tree.write('new_data_%s_closedPolygon_v1.0.kml' %color_name, xml_declaration=True)

    return