1

I have a working block of code, but something tells me it's not the most efficient.

  • start with a few strings
  • if it has DBA or ATTN followed by at least any 2 characters, capture DBA or ATTN to the end of line, don't look at the next string
  • strip out what was just captured

What I have below seems to do that just fine.

import re

alt_name = ""

name1 = "JUST A NAME"
name2 = "UNITED STATES STORE DBA USA INC"
name3 = "ANOTHER FIELD"

regex = re.compile(r"\b(DBA\b.{2,})|\b(ATTN\b.{2,})")
if re.search(regex, name1):
    match = re.search(regex, name1)
    alt_name = match.group(0)
    name1 = re.sub(regex, "", name1)
elif re.search(regex, name2):
    match = re.search(regex, name2)
    alt_name = match.group(0)
    name2 = re.sub(regex, "", name2)
elif re.search(regex, name3):
    match3 = re.search(regex, name3)
    alt_name = match.group(0)
    name3 = re.sub(regex, "", name3)

print(name1)
print(name2)
print(name3)
print(alt_name)

Is there a way to capture and strip with just 1 line instead of searching, matching and then subbing? I'm looking for efficiency and readability. Just making it short to be clever isn't what I'm going for. Maybe this is just the way to do it?

Samuel Dion-Girardeau
  • 2,790
  • 1
  • 29
  • 37
sniperd
  • 5,124
  • 6
  • 28
  • 44
  • Do you just want [`re.sub(r"\s*\b(?:DBA|ATTN)\b.{2,}", "", name2)`](http://rextester.com/WWQWZ74755)? – Wiktor Stribiżew May 29 '18 at 14:06
  • That's a better regex (and I'll use it!) it's the rest of the code. There isn't a tricky python way instead of having 4 lines to search, capture, and strip to do it in one? – sniperd May 29 '18 at 14:10
  • Not sure what you mean. [Maybe this](http://rextester.com/XFL37949)? This example uses a `global` keyword, but you may use a class variable. – Wiktor Stribiżew May 29 '18 at 14:15
  • Ah, val = m.group(0).lstrip() I think that is what I'm looking for. Thank you! – sniperd May 29 '18 at 14:17

1 Answers1

1

You may use a method as a replacement argument to re.sub where you may save the matched text into a variable, and if you want to remove the match found, just return and empty string.

However, the pattern you have must be re-written to be more efficient:

r"\s*\b(?:DBA|ATTN)\b.{2,}"

See the regex demo.

  • \s* - 0+ whitespace chars
  • \b - a word boundary
  • (?:DBA|ATTN) - either a DBA or ATTN substrings
  • \b - a word boundary
  • .{2,} - 2 or more chars other than LF symbols, as many as possible.

Here is an example:

import re

class RegexMatcher:
    val = ''
    rx = re.compile(r"\s*\b(?:DBA|ATTN)\b.{2,}")

    def runsub(self, m):
        self.val = m.group(0).lstrip()
        return ""

    def process(self, s):
        return self.rx.sub(self.runsub, s)

rm = RegexMatcher()
name = "UNITED STATES STORE DBA USA INC"
print(rm.process(name))
print(rm.val)

See the Python demo.

Maybe it makes more sense to make val a list variable, and then .append(m.group(0).lstrip()).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Just one question (?:DBA|ATTN) doesn't the ?: make it non-capturing? Although when I run it, it captures. – sniperd May 29 '18 at 17:37
  • @sniperd It is a [**non-capturing group**](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-what-does-do), it *matches*, but does NOT *capture*. – Wiktor Stribiżew May 29 '18 at 17:42