7

Several Google Maps products have the notion of polylines, which in terms of underlying data is basically just a sequence of lat/lng points that might for example manifest in a line drawn on a map. The Google Map developer libraries make use of an encoded polyline format that churns out an ASCII string representing the points making up the polyline. This encoded format is then typically decoded with a built in function of the Google libraries or a function written by a third party that implements the decoding algorithm.

The algorithm for encoding polyline points is described in the Encoded Polyline Algorithm Format document. What is not described is the rationale for implementing the algorithm this way, and the significance of each of the individual steps. I'm interested to know whether the thinking/purpose behind implementing the algorithm this way is publicly described anywhere. Two example questions:

  • Do some of the steps have a quantifiable impact on compression and how does this impact vary as a function of the delta between points?
  • Is the summing of values with ASCII 63 a compatibility hack of some sort?

But just in general, a description to go along with the algorithm explaining why the algorithm is implemented the way it is.

Bryce Thomas
  • 10,479
  • 26
  • 77
  • 126
  • Mark McClure used to have a good discussion. That server seems to be down now [snapshot on the wayback machine](http://web.archive.org/web/20130727020758/http://facstaff.unca.edu/mcmcclur/googlemaps/encodepolyline) – geocodezip Mar 18 '14 at 14:26
  • Would be interested as well. I found these code comments useful: http://stackoverflow.com/a/13890455/194609 – Karussell Jul 01 '14 at 11:48

1 Answers1

4

Update: This blog post from James Snook also has the 'valid ascii' range argument and reads logically for other steps I wondered. E.g. the left shifting before storing which makes place for the negative bit as the first bit.

Some explanations I found, not sure if everything is 100% correct.

  • One double value is stored in multiple 5 bits chunks and 0x20 (binary '0010 0000') is used as indication that the next 5 bit entry belongs to the current double.
  • 0x1f (binary '0001 1111') is used as bit mask to throw away other bits
  • I expect that 5 bits are used because the delta of lat or lons are in this range. So that every double value takes only 5 bits on average when done for a lot of examples (but not verified yet).
  • Now, compression is done by assuming nearby double values are very close and creating the difference is nearly 0, so that the results fits in a few bytes. Then this result is stored in a dynamic fashion: store 5 bits and if the value is longer mark with 0x20 and store the next 5 bits and so on. So I guess you can tweak the compression if you try 6 or 4 bits but I guess 5 is a practically reasonable choice.
  • Now regarding the magic 63, this is 0x3f and binary 0011 1111. I'm not sure why they add it. I thought that adding 63 will give some 'better' asci characters (e.g. allowed in XML or in URL) as we skip e.g. 62 which is > but 63 which is ? is really better? At least the first ascii chars are not displayable and have to be avoided. Note that if one would use 64 then one would hit the ascii char 127 for the maximum value of 31 (31+64+32) and this char is not defined in html4. Or is because of a signed char is going from -128 to 127 and we need to store the negative numbers as positive, thus adding the maximum possible negative number?
  • Just for me: here is a link to an official Java implementation with Apache License
Karussell
  • 17,085
  • 16
  • 97
  • 197
  • 2
    The reason for using a 64bit encoding is to be able to send the value as plain text. Clearly there are many other values other than 63 that would work for this. However as far as a very low work solution goes 63 seems to be a fairly good value to add. It avoids quotes and semicolons. One would suspect that avoiding '?' would have been ideal. It's difficult to see a better range to choose for 64 characters in the ascii table. [wikipedia](http://en.wikipedia.org/wiki/Base64) suggests that for a lot of 64bit encodings a more complex map is often chosen. – James Snook Jul 01 '14 at 19:45
  • The link for the _official Java implementation with Apache License_ returns 404. Can you update it? – Peter VARGA May 08 '21 at 19:13