remove elements from a list that begin with a specific character

Question

I am trying to remove all mentions of anyone from a string, I was wondering if there was a faster way to do this?

text = "hey @foo say hi to @bar"
textsplit = text.split()
n = -1
ts2 = textsplit
for x in textsplit:
    n += 1
    if x[0]== "@":
        del ts2[n]
text = ' '.join(ts2)

Thanks in advance. (This is sort of like Removing elements from a list containing specific characters but this one is a little different.)

score 3 · Accepted Answer · answered Jan 05 '15 at 02:06

3

This does the same as your code:

' '.join(x for x in text.split() if not x.startswith('@'))

answered Jan 05 '15 at 02:06

elyase

39,479
12
112
119

Marcin · Answer 2 · 2015-01-05T02:07:08.237

1

What about this one, using re module and regular expression:

print(" ".join(re.sub('^@\w+', '', w) for w in text.split()))

edited Jan 05 '15 at 02:07

answered Jan 05 '15 at 02:01

Marcin

215,873
14
235
294

score 1 · Answer 3 · answered Jan 05 '15 at 02:10

1

This is simpler and faster:

text = "hey @foo say hi to @bar"
newtext = ' '.join([i for i in text.split() if not i.startswith('@')])

answered Jan 05 '15 at 02:10

Chris Johnson

20,650
6
81
80

score 1 · Answer 4 · answered Jan 05 '15 at 04:13

1

text = "hey @foo say hi to @bar"
newtext = re.sub(' @[!\w]+', '', text)

No need to use any loops, simply use regular expression.

answered Jan 05 '15 at 04:13

blue0cean

21
3

zehnpaard · Answer 5 · 2015-01-07T13:24:57.933

I defer to @elyase and @chris-johnson's answers for the actual simple beautiful code that you should use.

@elyase's answer is simpler, but I think @chris-johnson's might be slightly more efficient because of how join works. @elyase's code creates a generator object, then join will convert it into a list before running, which I believe has more overhead than just creating a list to begin with. But this is a minor optimization point.

I just identified a few code smells in your sample code, so want to point them out.

text = "hey @foo say hi to @bar"
textsplit = text.split()
n = -1
ts2 = textsplit # code smell 1
for x in textsplit:
    n += 1 # code smell 2
    if x[0]== "@":
        del ts2[n] # code smell 3
text = ' '.join(ts2)

Code smell 1: I imagine you want to create a copy of a list with ts2 = textsplit, but this isn't happening. You're just creating another name for the list that textsplit refers to, so changing ts2 will change textsplit and viceversa. You can do ts2 = textsplit[:] to make a copy of a non-nested list.

Code smell 2: You're creating a variable n and using that as the index by manually incrementing at each iteration. If that's all you're doing, use for n, x in enumerate(textsplit) instead.

Code smell 3: Two things here:

Because you didn't copy textsplit, you're looping over a list and modifying it at once - avoid this at all costs, it causes bugs that are insanely hard to reason about.
Even if ts2 were a copy, this row is problematic because when you delete an element in ts2, the index gets thrown out of sync. In your example, after deleting '@foo', the indices are now off by one, so trying to access/delete '@bar' using ts2[n] will throw an IndexError. If you are going to engage in index twiddling, you need to decrement n every time you delete an item.

But generally, index twiddling is a source of many many bugs. Don't do it if you don't have to. And in Python you often don't have to.

Woah thanks for explaining, I learned some new things from that, I will be sure to remember this next time! — The Only One Around, Jan 05 '15 at 03:35

Eithos · Answer 6 · 2015-01-05T07:30:18.067

It occurred to me that all the other answers are operating under the assumption that you wish to remove the @... substring and maintain a separation of ' ' between different words (or sets of characters other than ' '), as evidenced by your code. However, the question does not otherwise explicity point this as the objective. And, since there could potentially be a situation when (don't ask me) this behaviour isn't the correct one, here we go!

Edit: Readable and flexible now (vs old code-golfy versions)

My original post was a bit silly in that the code really wasn't meant for production; it worked, but that was it. This now accomplishes three types of substring substractions effortlessly, although perhaps it could be done better with regular expressions (not too experienced there).

text = "hey @foo say hi to @bar"

Regular version with only a single `' '` to separate the remaining words

newText = ''.join(
    text[i] if text.rfind('@', 0, i+2) <= text.rfind(' ', 0, i+1) else
    '' for i in xrange(len(text)))

>>> 'hey say hi to'

Removes only the specified substring (without removing any other whitespace)

newText = ''.join(
    text[i] if text.rfind('@', 0, i+1) <= text.rfind(' ', 0, i+1) else
    '' for i in xrange(len(text)))

>>> 'hey  say hi to '

Transforms the substring into whitespace

newText = ''.join(
    text[i] if text.rfind('@', 0, i+1) <= text.rfind(' ', 0, i+1) else
    ' ' for i in xrange(len(text)))

>>> 'hey      say hi to     '

Hope this helps, somehow!

remove elements from a list that begin with a specific character

6 Answers6

Regular version with only a single ' ' to separate the remaining words

Removes only the specified substring (without removing any other whitespace)

Transforms the substring into whitespace

Regular version with only a single `' '` to separate the remaining words