1

I've been digging around for a way to validate a list of latitude/longitude pairs. Although I've found some good examples on how to validate pairs and how to validate a list of pairs, I have been unable to write a regex to match my specific requirements. The requirements are as follows:

  1. Every latitude/longitude pair must contain valid values (solved from the first link)
  2. Every pair must be separated from other pairs by a comma. The second link uses a semicolon, but replacing this with a comma causes some problems in the regex
  3. There must be an even number of coordinates so that every coordinate is paired up
  4. White space after commas is ok
  5. At least 3 coordinate pairs are required so that the coordinates form a polygon (but we don't need to worry about duplicate coordinates), and the final coordinate pair should not be followed by a comma

The following entries should be valid:

32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901

32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901,
32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901

32.3078,64.7505,27.6648,81.5158,18.2208,66.5901

While these should be invalid:

//only 1 pair
32.3078, 64.7505

//no commas separating each pair
32.3078, 64.7505
27.6648, 81.5158
18.2208, 66.5901

//odd number of pairs
32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901,
32.3078, 64.7505,
27.6648, 81.5158,
18.2208

//comma after the final pair
32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901,
izeke
  • 75
  • 8

1 Answers1

1

Improved Solution

Edit: thanks to suggestions I have reworked my solution and decided that it makes the most sense to worry about the trailing comma and bounds checking in code instead of in regex. I have been able to reduce the regex's complexity significantly.

The following is my final solution:

^((-?(\d{1,3})(\.\d+)?,\s*){2}){3,}$

The main group matches a single coordinate:

(-?(\d{1,3})(\.\d+)?,\s*)
 -?                        Optional negative sign
   (\d{1,3})               Match any sequence of 1 to 3 digits
            (\.\d+)?       Optionally match a decimal point followed by at least 1 digit
                    ,\s*   Match a comma followed by any amount or type of white-space

This would match 83.1642,, 1,, 987.654321,, and -91.000 which are all valid in format (which we want) although they may not be in range (which we can check in code later).

And now, for the rest of the regex:

(-?(\d{1,3})(\.\d+)?,\s*){2}
                         {2} Requires that exactly a pair of points be present

((-?(\d{1,3})(\.\d+)?,\s*){2}){3,}
                              {3,} Requires that 3 or more pairs of points must be present

And, of course, ^ and $ denote the start and end of the string respectively. This gives us the regex to (almost!) match all of the strings in our question: ^((-?(\d{1,3})(\.\d+)?,\s*){2}){3,}$

You may notice that this requires a trailing comma, unlike our question. The following strings are considered valid:

32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901,
32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901,
32.3078, 64.7505,
27.6648, 81.5158,
18.2208, 66.5901,
32.3078,64.7505,27.6648,81.5158,18.2208,66.5901,

To address the trailing comma issue, we can simply concatenate a comma onto our string in code, like so:

(string + ',').match(/^((-?(\d{1,3})(\.\d+)?,\s*){2}){3,}$/)

and voila! Our code now matches coordinate pair lists without trailing commas!


Old Solution

I was just about to post my question when I figured it out, so here's my solution:

/^([-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?))(,\s*[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)){2,}$/

The first half,

([-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?))

Should match the first line minus the comma at the end of the first line. The second half,

(,\s*[-+]?([1-8]?\d(\.\d+)?|90(\.0+)?),\s*[-+]?(180(\.0+)?|((1[0-7]\d)|([1-9]?\d))(\.\d+)?)){2,}

matches all following lines, requiring at least 2 more coordinate pairs for a minimum total of 3. An even number of coordinates are required inherently.

See regex101 for the working query

izeke
  • 75
  • 8
  • 4
    I think you are going the wrong way doing all your checks by regex, in particular to check numeric ranges and because your pattern stays approximative. If I were you I will only check for allowed characters with regex and in a second time, I will check the number of (eventual) coordinates and values ranges in this order with code. Doing that this way avoids a too long pattern, a more readable code and avoids a lot of useless capture groups. – Casimir et Hippolyte Jul 30 '19 at 22:53
  • Same comments as above. Nice question, but not upvoting this proposed answer, hardly readable. Could you maybe first reprocess your input to insert a "newline" after each second comma on the same line, if it is not the end of string ? and same remark about value validation less than 180: that's not a nice to job to do by regex. – Pac0 Jul 31 '19 at 12:40
  • I find that regex to be pretty bulky and pretty disgusting. Not only is it hard to read, but you dont explain anything for others who may need a similar solution. Not upvote worth. I am sure there is a better answer – Fallenreaper Jul 31 '19 at 12:41
  • 1
    Thanks to your suggestions, I've added an improved solution. I appreciate the pointers! – izeke Aug 01 '19 at 13:05