24

I need to find out whether a name starts with any of a list's prefixes and then remove it, like:

if name[:2] in ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]:
    name = name[2:]

The above only works for list prefixes with a length of two. I need the same functionality for variable-length prefixes.

How is it done efficiently (little code and good performance)?

A for loop iterating over each prefix and then checking name.startswith(prefix) to finally slice the name according to the length of the prefix works, but it's a lot of code, probably inefficient, and "non-Pythonic".

Does anybody have a nice solution?

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Kawu
  • 13,647
  • 34
  • 123
  • 195
  • 1
    The solution you describe is pretty decent. – brc Sep 24 '11 at 15:30
  • It isn't a lot of code to do, just a lot of code to make clear. – Ignacio Vazquez-Abrams Sep 24 '11 at 15:33
  • @brc the issue was that the prefixes could be multiple characters, so it wouldnt be sufficient to check `name[:2]` – Foo Bah Sep 24 '11 at 15:37
  • @FooBah No, the second solution of using `startswith` etc. – brc Sep 24 '11 at 15:40
  • `A for loop iterating over each prefix and then checking name.startswith(prefix) to finally slice the name according to the length of the prefix works` That sounds pretty pythonic to me. That shouldn't me more than 5 or 10 lines of code. "Pythonic" doesn't mean it has to be done in 1 line. – Falmarri Sep 24 '11 at 17:11
  • I know this is a really old question but what would you want to have happen if the name starts with multiple prefixes in the list, where each of the prefixes were different lengths? ex. name = "amazing", list = ['am', 'ama', 'amaz']. Should it remove 2, 3, or 4 characters? – KrisF Sep 29 '14 at 02:39

11 Answers11

49

str.startswith(prefix[, start[, end]])¶

Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.

$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: prefixes = ("i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_")

In [2]: 'test'.startswith(prefixes)
Out[2]: False

In [3]: 'i_'.startswith(prefixes)
Out[3]: True

In [4]: 'd_a'.startswith(prefixes)
Out[4]: True
dm03514
  • 54,664
  • 18
  • 108
  • 145
  • I also need to remove the found prefix from the name in case it starts with one of the prefixes. Maybe the question was a little inaccurate, however I still like the fact that `str.startswith` also accepts a tuple. (unchecked) – Kawu Sep 24 '11 at 16:18
  • 6
    yes, because it accepts tuples it might be the cleanest implementation. – dm03514 Sep 24 '11 at 23:00
15

A bit hard to read, but this works:

name=name[len(filter(name.startswith,prefixes+[''])[0]):]
Vaughn Cato
  • 63,448
  • 5
  • 82
  • 132
  • Very nice, this even ignores unprefixed names. Perfect. – Kawu Sep 26 '11 at 12:07
  • 1
    For those more used to list comprehensions, this is equivalent to: `name=name[len([prefix for prefix in prefixes+[''] if name.startswith(prefix)][0]):]` – Filipe Correia Sep 11 '12 at 11:40
5
for prefix in prefixes:
    if name.startswith(prefix):
        name=name[len(prefix):]
        break
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
3

Regexes will likely give you the best speed:

prefixes = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_", "also_longer_"]
re_prefixes = "|".join(re.escape(p) for p in prefixes)

m = re.match(re_prefixes, my_string)
if m:
    my_string = my_string[m.end()-m.start():]
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
2

If you define prefix to be the characters before an underscore, then you can check for

if name.partition("_")[0] in ["i", "c", "m", "l", "d", "t", "e", "b", "foo"] and name.partition("_")[1] == "_":
    name = name.partition("_")[2]
Foo Bah
  • 25,660
  • 5
  • 55
  • 79
  • I'd use `"_" in name` as your second clause to avoid partitioning the string twice, and in fact I'd put that clause first to avoid partitioning the string at all if there's no underscore in it. But good thinking. – kindall Sep 24 '11 at 16:07
2

What about using filter?

prefs = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
name = list(filter(lambda item: not any(item.startswith(prefix) for prefix in prefs), name))

Note that the comparison of each list item against the prefixes efficiently halts on the first match. This behaviour is guaranteed by the any function that returns as soon as it finds a True value, eg:

def gen():
    print("yielding False")
    yield False
    print("yielding True")
    yield True
    print("yielding False again")
    yield False

>>> any(gen()) # last two lines of gen() are not performed
yielding False
yielding True
True

Or, using re.match instead of startswith:

import re
patt = '|'.join(["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"])
name = list(filter(lambda item: not re.match(patt, item), name))
etuardu
  • 5,066
  • 3
  • 46
  • 58
2

Regex, tested:

import re

def make_multi_prefix_matcher(prefixes):
    regex_text = "|".join(re.escape(p) for p in prefixes)
    print repr(regex_text)
    return re.compile(regex_text).match

pfxs = "x ya foobar foo a|b z.".split()
names = "xenon yadda yeti food foob foobarre foo a|b a b z.yx zebra".split()

matcher = make_multi_prefix_matcher(pfxs)
for name in names:
    m = matcher(name)
    if not m:
        print repr(name), "no match"
        continue
    n = m.end()
    print repr(name), n, repr(name[n:])

Output:

'x|ya|foobar|foo|a\\|b|z\\.'
'xenon' 1 'enon'
'yadda' 2 'dda'
'yeti' no match
'food' 3 'd'
'foob' 3 'b'
'foobarre' 6 're'
'foo' 3 ''
'a|b' 3 ''
'a' no match
'b' no match
'z.yx' 2 'yx'
'zebra' no match
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Nice complete solution and I appreciate the escaping and testing! I'm sure this regex based approach would run faster than list comprehensions etc for any sizeable amount of data, with a fairly long list of prefixes. – RichVel Apr 08 '13 at 16:49
1

When it comes to search and efficiency always thinks of indexing techniques to improve your algorithms. If you have a long list of prefixes you can use an in-memory index by simple indexing the prefixes by the first character into a dict.

This solution is only worth if you had a long list of prefixes and performance becomes an issue.

pref = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]

#indexing prefixes in a dict. Do this only once.
d = dict()
for x in pref:
        if not x[0] in d:
                d[x[0]] = list()
        d[x[0]].append(x)


name = "c_abcdf"

#lookup in d to only check elements with the same first character.
result = filter(lambda x: name.startswith(x),\
                        [] if name[0] not in d else d[name[0]])
print result
Manuel Salvadores
  • 16,287
  • 5
  • 37
  • 56
0

Could use a simple regex.

import re
prefixes = ("i_", "c_", "longer_")
re.sub(r'^(%s)' % '|'.join(prefixes), '', name)

Or if anything preceding an underscore is a valid prefix:

name.split('_', 1)[-1]

This removes any number of characters before the first underscore.

wihlke
  • 2,455
  • 1
  • 19
  • 18
0

This edits the list on the fly, removing prefixes. The break skips the rest of the prefixes once one is found for a particular item.

items = ['this', 'that', 'i_blah', 'joe_cool', 'what_this']
prefixes = ['i_', 'c_', 'a_', 'joe_', 'mark_']

for i,item in enumerate(items):
    for p in prefixes:
        if item.startswith(p):
            items[i] = item[len(p):]
            break

print items

Output

['this', 'that', 'blah', 'cool', 'what_this']
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
-1
import re

def make_multi_prefix_replacer(prefixes):
    if isinstance(prefixes,str):
        prefixes = prefixes.split()
    prefixes.sort(key = len, reverse=True)
    pat = r'\b(%s)' % "|".join(map(re.escape, prefixes))
    print 'regex patern :',repr(pat),'\n'
    def suber(x, reg = re.compile(pat)):
        return reg.sub('',x)
    return suber



pfxs = "x ya foobar yaku foo a|b z."
replacer = make_multi_prefix_replacer(pfxs)               

names = "xenon yadda yeti yakute food foob foobarre foo a|b a b z.yx zebra".split()
for name in names:
    print repr(name),'\n',repr(replacer(name)),'\n'

ss = 'the yakute xenon is a|bcdf in the barfoobaratu foobarii'
print '\n',repr(ss),'\n',repr(replacer(ss)),'\n'
eyquem
  • 26,771
  • 7
  • 38
  • 46