0

I have a string that looks like this:

POLYGON ((148210.445767647 172418.761192525, 148183.930888667 172366.054787545, 148183.866770629 172365.316772032, 148184.328078148 172364.737139913, 148220.543522168 172344.042601933, 148221.383518338 172343.971823159), (148221.97916844 172344.568316375, 148244.61381946 172406.651932395, 148244.578100039 172407.422441673, 148244.004662562 172407.938319453, 148211.669446582 172419.255646473, 148210.631989339 172419.018894911, 148210.445767647 172418.761192525))

I can easily strip POLYGON out of the string to focus on the numbers but I'm kinda wondering what would be the easiest/best way to parse this string into a list of dict.

The first parenthesis (right after POLYGON) indicates that multiple elements can be provided (separated by a comma ,).

So each pair of numbers is to supposed to be x and y.

I'd like to parse this string to end up with the following data structure (using python 2.7):

list [ //list of polygons
  list [ //polygon n°1
    dict { //polygon n°1's first point
      'x': 148210.445767647, //first number
      'y': 172418.761192525 //second number
    },
    dict { //polygon n°1's second point
      'x': 148183.930888667,
      'y': 148183.930888667
    },
    ... // rest of polygon n°1's points
  ], //end of polygon n°1
  list [ // polygon n°2
    dict { // polygon n°2's first point
      'x': 148221.9791684,
      'y': 172344.568316375
    },
    ... // rest of polygon n°2's points
  ] // end of polygon n°2
] // end of list of polygons

Polygons' number of points is virtually infinite.
Each point's numbers are separated by a blank.

Do you guys know a way to do this in a loop or any recursive way ?

PS: I'm kind of a python beginner (only a few months under my belt) so don't hesitate to explain in details. Thank you!

Bach
  • 6,145
  • 7
  • 36
  • 61
Michael De Keyser
  • 787
  • 1
  • 17
  • 45
  • The only solution that comes to my mind is using the substring mechanism like `string[xx:xx]` over and over again together with `string.index('character')` but I've seen so many pretty solutions from python devs on SO that I believe there must be a prettier solution than this. I'm not asking for people to actually do the work for me, but a lead or two would be awesome. – Michael De Keyser May 21 '14 at 13:51

3 Answers3

2

The data structure you have defining your Polygon object looks very similar to a python tuple declaration. One option, albeit a bit hacky would be to use python's AST parser.

You would have to strip off the POLYGON part and this solution may not work for other declarations that are more complex.

import ast
your_str = "POLYGON (...)"
# may be better to use a regex to split off the class part 
# if you have different types
data = ast.literal_eval(your_str.replace("POLYGON ",""))
x, y = data
#now you can zip the two x and y pairs together or make them into a dictionary
GWW
  • 43,129
  • 11
  • 115
  • 108
  • I don't understand `x, y = data` part. Will x and y be lists of strings/numbers and so using a loop I would create my dictionary ? Also, if there are more than one Polygon defined in the string, how will I know where the first stops and the second starts ? – Michael De Keyser May 21 '14 at 14:02
  • 1
    @MichaelDeKeyser: sorry the `x,y = data` part is just shorthand for `x = data[0]` and `y = data[1]`. `x and y` will both contain the lists of floating point numbers. – GWW May 21 '14 at 14:04
1

Lets say u have a string that looks like this

my_str = 'POLYGON ((148210.445767647 172418.761192525, 148183.930888667 172366.054787545, 148183.866770629 172365.316772032, 148184.328078148 172364.737139913, 148220.543522168 172344.042601933, 148221.383518338 172343.971823159), (148221.97916844 172344.568316375, 148244.61381946 172406.651932395, 148244.578100039 172407.422441673, 148244.004662562 172407.938319453, 148211.669446582 172419.255646473, 148210.631989339 172419.018894911, 148210.445767647 172418.761192525))'

my_str = my_str.replace('POLYGON ', '')
coords_groups = my_str.split('), (')

for coords in coords_groups:
    coords.replace('(', '').replace(')', '')
    coords_list = coords.split(', ')
    coords_list2 = []
    for item in coords_list:
        item_split = item.split(' ')
        coords_list2.append({'x', item_split[0], 'y': item_split[1]})

I think this should help a little

All u need now is a way to get info between parenthesis, this should help Regular expression to return text between parenthesis

UPDATE updated code above thanks to another answer by https://stackoverflow.com/users/2635860/mccakici , but this works only if u have structure of string as u have said in your question

Community
  • 1
  • 1
Sardorbek Imomaliev
  • 14,861
  • 2
  • 51
  • 63
  • I guess I could find some way to narrow the string down to a couple of what you have in `my_str`. Thanks! – Michael De Keyser May 21 '14 at 13:55
  • If I get this right, I might strip the `POLYGON ((` part and then strip the `))` and then use my_str.split('), (') to narrow down to multiple string like you exposed in your answer. This should be right, shouldn't it ? :) – Michael De Keyser May 21 '14 at 14:00
  • There's a syntax error at the last line `{'x', item_split[0], 'y': item_split[1]}` should be `{'x', item_split[0], 'y', item_split[1]}` (`,` instead of `:`). It serves the purpose well but @mccakici's answer is more on the spot. Thanks for your help ! – Michael De Keyser May 21 '14 at 14:17
1

can you try?

import ast

POLYGON = '((148210.445767647 172418.761192525, 148183.930888667 172366.054787545, 148183.866770629 172365.316772032, 148184.328078148 172364.737139913, 148220.543522168 172344.042601933, 148221.383518338 172343.971823159), (148221.97916844 172344.568316375, 148244.61381946 172406.651932395, 148244.578100039 172407.422441673, 148244.004662562 172407.938319453, 148211.669446582 172419.255646473, 148210.631989339 172419.018894911, 148210.445767647 172418.761192525))'
new_polygon = '(' + POLYGON.replace(', ', '),(').replace(' ', ',') + ')'


data = ast.literal_eval(new_polygon)
result_list = list()
for items in data:
    sub_list = list()
    for item in items:
        sub_list.append({
            'x': item[0],
            'y': item[1]
        })
    result_list.append(sub_list)

print result_list
mccakici
  • 550
  • 3
  • 7
  • 1
    It is much safer to use `ast.literal_eval` than `eval`. – GWW May 21 '14 at 14:05
  • I added `POLYGON.replace('POLYGON ', '')` on line 3 and was able to make this work perfectly (with my string starting with `POLYGON ((123...`. Thanks a lot @GWW and @mccakici. – Michael De Keyser May 21 '14 at 14:15