1

I am trying to extract coordinates of various formats from reddit comments. I want to be able to extract the two different coordinate formats supported on google maps:

73.180633, -98.100802 and 73°10'50.3"N 98°06'02.9"W

I am able to extract coordinates in the first format with this expression:

([-+]?\d{1,2}[.]\d+),\s*([-+]?\d{1,3}[.]\d+)

However, I have not been able to successfully find or make a pattern to match the second format.

Heikki
  • 2,214
  • 19
  • 34
rgmcode
  • 11
  • 2

2 Answers2

1

This will do it for the second format: \d{1,3}°\d{1,3}'\d{1,3}\.\d\"[N|S]\s\d{1,3}°\d{1,3}'\d{1,3}\.\d\"[E|W]

See: https://regex101.com/r/LN1igj/8

There are two very similar groups, apparently it's not possible to simplify that. See How to capture multiple repeated groups?.

To capture both formats in one regex: (?:((?:[-+]?\d{1,2}[.]\d+),\s*(?:[-+]?\d{1,3}[.]\d+))|(\d{1,3}°\d{1,3}'\d{1,3}\.\d\"[N|S]\s\d{1,3}°\d{1,3}'\d{1,3}\.\d\"[E|W]))

See: https://regex101.com/r/LN1igj/7

But this is not really readable anymore. I would advice to solve that in code.

Christian Baumann
  • 3,188
  • 3
  • 20
  • 37
  • Happy to help. Thx for accepting the answer and upvoting it. – Christian Baumann Oct 02 '20 at 05:50
  • 1
    Longitude goes from -180 to +180. So I think at least one of your {2}'s needs to be {2,3} – Frank Yellin Oct 02 '20 at 05:57
  • Is there a way I can combine my two expressions into a single one? I tried using an or (exp1 | exp2) but that did not work. Neither format matched – rgmcode Oct 02 '20 at 05:58
  • @ChristianBaumann doesn't seem to work for `36°10'13.6"N 115°08'23.6"W`, tested using the combined expression – rgmcode Oct 02 '20 at 07:07
  • @rgmcode Updated: `(?:((?:[-+]?\d{1,2}[.]\d+),\s*(?:[-+]?\d{1,3}[.]\d+))|(\d{2,3}°\d{2}'\d{2,3}\.\d\"[N|S]\s\d{2,3}°\d{2}'\d{2,3}\.\d\"[E|W]))` https://regex101.com/r/LN1igj/4 – Christian Baumann Oct 02 '20 at 07:10
  • @rgmcode What are valid ranges for the 4 numbers: `[1]°[2]'[3].[4]"N` ? – Christian Baumann Oct 02 '20 at 07:13
  • @ChristianBaumann the first term <=90 for latitude, <=180 for longitude, I think everything else should be <=60, It seems like the pattern fails to match when the longitude degree term is >=100 – rgmcode Oct 02 '20 at 07:26
0
  • Latitudes are represented from 90S 90N and longitudes are represented from 0 to 180.
  • Apart from seconds, Degrees and minutes also can have decimal points instead of further sub units.
  • It can also be represented as just decimal numbers with +/-

Please try below regex.

(^| )(-?\d{1,2}(\.\d+)?(?=\s*,?\s*)[\s,]+-?\d{1,3}(\.\d+)?|\d{1,2}(\.\d+°|°(\d{1,2}(\.\d+'|'(\d{1,2}(\.\d+)?\")?))?)[NS](?=\s*,?\s*)[\s,]+\d{1,3}(\.\d+°|°(\d{1,2}(\.\d+'|'(\d{1,2}(\.\d+)?\")?))?)[EW])

Demo

Liju
  • 2,273
  • 3
  • 6
  • 21
  • thank you! This worked on other formats as well such as those on google earth (which uses different dms and decimal than maps for some reason) which is super useful – rgmcode Oct 03 '20 at 09:06