0

I would like to retrieve specific information (coordinates) from a string, where some characters (like '' and line breaks) can appear anywhere in it. My long-term goal here is to replace the coordinates in the string with the name of the city that it corresponds to.

Here is an example of string (it's wikicode):

''These are the coordinates'' : 

''Long. 17d. 6′. 8″. lat. 47d. 28′. 8″
''Lon. ''36d. 70′. 80″. ''lat. 45d. 20′. 5″
''Long. 17d. 6′. 8″.
lat. 47d. 28′. 8″

(I want the 3 coordinates to match).

and here is the PCRE regex that I came up with so far:

/''Long?(?:'')?\.(?:'')? .*?(?P<Degrees>\d+).*?d.+?(?P<Minutes>\d+)′.*?(?P<Seconds>\d+)″.*?(?:'')?lat(?:'')?\.(?:'')?.*?(?P<Degrees>\d+).*?d.+?(?P<Minutes>\d+)′.*?(?P<Seconds>\d+)″.*?/gmJ

regex101 link

The last coordinates don't match my regex because there is a line break in the middle. At first, I thought: "Well, I'll just add an optional line break in the regex", but then I realize that line breaks, just like '', can potentially appear anywhere in the string, and adding optional line breaks and optional '' (like I started to do in the regex) after each character in my regex sounds like it will be completely unreadable and unmaintainable.

What are my options here? In theory, I could just remove '' and line breaks from the string, but I actually need them in other parts of it (for instance in the substring ''These are the coordinates'' : these '' must stay).

Bruno Pérel
  • 575
  • 6
  • 21

0 Answers0