0

Within legal descriptions it is not uncommon to have compounded Section, Township, Range. A standard S-T-R is like this 18-8N-12E where the numbers can be 1 or 2 digits and the center T portion and ending R portion are followed by a single uppercase direction letter. Sometimes the legal is compounded with several beginning sections that are comma separated such as ' 5,7,8,18-8N-12E ', meaning the sections are 5,7,8,18 in the same township-range of 8N-12E. I'm trying to come up with the best way to capture this sort of string which can vary with the number of sections following this similar comma separted pattern, and must be followed with the exact S-T-R legal pattern. The goal is to capture the repeating pattern as well as the exact S-T-R legal pattern following it.
In this example:

text = ' 5,7,8,18-8N-12E ' 

I'm testing with the following regex

>>> re.findall('(\d{1,2}(,\d{1,2})+)+,(\d{1,2}-\d{1,2}[A-Z]{1}-\d{1,2}[A-Z]{1})', text)
[('5,7,8', ',8', '18-8N-12E')]

which works and I could parse what I'm wanting from it, but I suspect there is a better way, so what is the best approach to capturing this?
Thanks

observer7
  • 45
  • 1
  • 6
  • Consider `re.split(r',(?=[^,]*,?[^,]*$)', str.strip()))`. That returns `['5,7', '8', '18-8N-12E']`, which is not exactly what you want (`'8'` rather than `',8'`), but maybe it's close enough or even preferable. This splits the string on every comma that is followed later in the string by at most one comma. [Demo](https://tio.run/##K6gsycjPM/7/PzO3IL@oRKEolYuruKRIwVZBXcFUx1zHQsfQQtfCT9fQyFVBnasovQIoU6Suo2FvGx2nE6ulYw@mVDTVubgKijLzSjSKUvWKC3IygYz0Ch0FoFF6QJxZoKGpqcn1/z8A). Note that I do not fully understand what you want so I cannot attest to the universal validity of this code. – Cary Swoveland May 26 '23 at 17:13
  • the linked question, used as a grounds to close this question, does not answer this question? Capturing repeating patterms answers the question as much as saying that regex answers the question. Just yet another case of power wielding trolling. – observer7 May 26 '23 at 19:15
  • Returning to answer, since the linked answer has nothing whatsoever to contribute to this question. The easiest way I came up with it is this, which can then be split into tokens for further processing: `>>> re.findall('[\d,]*\d{1,2}-\d{1,2}[A-Z]{1}-\d{1,2}[A-Z]{1}', text) ['5,7,8,18-8N-12E'] >>> re.findall('[\d,]*\d{1,2}-\d{1,2}[A-Z]{1}-\d{1,2}[A-Z]{1}', text)[0].split(',') ['5', '7', '8', '18-8N-12E']` – observer7 May 26 '23 at 20:32

0 Answers0