1

I had the case to set together letters with length one when they were separated by just space, very straightforward. So I applied the following regex pattern:

res = re.sub(r'(?<=\b[A-Za-z]\b)\s+(?=[a-zA-Z]\b)|\s+$',
             '',
             text,
             0,
             re.IGNORECASE)

To set letters one length together strings such as:

A B SCHOOL DISTRICT
J B UNIVERISTY
X Z SCHOOL LAB

which become as:

AB SCHOOL DISTRICT
JB UNIVERISTY
XZ SCHOOL LAB

However, when increasing the number to two letters, look behind does not support quantifiers. Then, I applied the following regex:

res = re.sub(r'(\b[A-Za-z]{1,2}\b)\s+(?=[a-zA-Z]{1,2}\b)|\s+$',
             r'\1',
             text,
             0,
             re.IGNORECASE)

For example, the following strings:

AB XY SCHOOL DISTRICT
JB ZC UNIVERISTY
XZ AB SCHOOL LAB

which become as:

ABXY SCHOOL DISTRICT
JBZC UNIVERISTY
XZAB SCHOOL LAB

Considering the second regex pattern. The second pattern is doing the work already, but wondering if it is the best way to do it. Do you find better approach to cope with the problem of including quantifiers in lookbehind?

Thanks

John Barton
  • 1,581
  • 4
  • 25
  • 51
  • Thanks @Wiktor Stribiżew. Not sure, if the second answer in https://stackoverflow.com/questions/24987403/variable-width-lookbehind-issue-in-python can be done in the current question, I cannot take out from look behind the pattern [A-Za-z]{1,2} . In addition, the first answer refers to using an external package called regex. – John Barton Apr 22 '20 at 17:26
  • 1
    If you ask how it can be done *better*, then I will repeat: use PyPi regex module. All your recent questions beg for this module. – Wiktor Stribiżew Apr 22 '20 at 17:31

0 Answers0