1

I am trying to remove all mentions of anyone from a string, I was wondering if there was a faster way to do this?

text = "hey @foo say hi to @bar"
textsplit = text.split()
n = -1
ts2 = textsplit
for x in textsplit:
    n += 1
    if x[0]== "@":
        del ts2[n]
text = ' '.join(ts2)

Thanks in advance. (This is sort of like Removing elements from a list containing specific characters but this one is a little different.)

Community
  • 1
  • 1

6 Answers6

3

This does the same as your code:

' '.join(x for x in text.split() if not x.startswith('@'))
elyase
  • 39,479
  • 12
  • 112
  • 119
1

What about this one, using re module and regular expression:

print(" ".join(re.sub('^@\w+', '', w) for w in text.split()))
Marcin
  • 215,873
  • 14
  • 235
  • 294
1

This is simpler and faster:

text = "hey @foo say hi to @bar"
newtext = ' '.join([i for i in text.split() if not i.startswith('@')])
Chris Johnson
  • 20,650
  • 6
  • 81
  • 80
1
text = "hey @foo say hi to @bar"
newtext = re.sub(' @[!\w]+', '', text)

No need to use any loops, simply use regular expression.

blue0cean
  • 21
  • 3
0

I defer to @elyase and @chris-johnson's answers for the actual simple beautiful code that you should use.

@elyase's answer is simpler, but I think @chris-johnson's might be slightly more efficient because of how join works. @elyase's code creates a generator object, then join will convert it into a list before running, which I believe has more overhead than just creating a list to begin with. But this is a minor optimization point.

I just identified a few code smells in your sample code, so want to point them out.

text = "hey @foo say hi to @bar"
textsplit = text.split()
n = -1
ts2 = textsplit # code smell 1
for x in textsplit:
    n += 1 # code smell 2
    if x[0]== "@":
        del ts2[n] # code smell 3
text = ' '.join(ts2)

Code smell 1: I imagine you want to create a copy of a list with ts2 = textsplit, but this isn't happening. You're just creating another name for the list that textsplit refers to, so changing ts2 will change textsplit and viceversa. You can do ts2 = textsplit[:] to make a copy of a non-nested list.

Code smell 2: You're creating a variable n and using that as the index by manually incrementing at each iteration. If that's all you're doing, use for n, x in enumerate(textsplit) instead.

Code smell 3: Two things here:

  1. Because you didn't copy textsplit, you're looping over a list and modifying it at once - avoid this at all costs, it causes bugs that are insanely hard to reason about.
  2. Even if ts2 were a copy, this row is problematic because when you delete an element in ts2, the index gets thrown out of sync. In your example, after deleting '@foo', the indices are now off by one, so trying to access/delete '@bar' using ts2[n] will throw an IndexError. If you are going to engage in index twiddling, you need to decrement n every time you delete an item.

But generally, index twiddling is a source of many many bugs. Don't do it if you don't have to. And in Python you often don't have to.

zehnpaard
  • 6,003
  • 2
  • 25
  • 40
0

It occurred to me that all the other answers are operating under the assumption that you wish to remove the @... substring and maintain a separation of ' ' between different words (or sets of characters other than ' '), as evidenced by your code. However, the question does not otherwise explicity point this as the objective. And, since there could potentially be a situation when (don't ask me) this behaviour isn't the correct one, here we go!

Edit: Readable and flexible now (vs old code-golfy versions)

My original post was a bit silly in that the code really wasn't meant for production; it worked, but that was it. This now accomplishes three types of substring substractions effortlessly, although perhaps it could be done better with regular expressions (not too experienced there).

text = "hey @foo say hi to @bar"

Regular version with only a single ' ' to separate the remaining words

newText = ''.join(
    text[i] if text.rfind('@', 0, i+2) <= text.rfind(' ', 0, i+1) else
    '' for i in xrange(len(text)))

>>> 'hey say hi to'

Removes only the specified substring (without removing any other whitespace)

newText = ''.join(
    text[i] if text.rfind('@', 0, i+1) <= text.rfind(' ', 0, i+1) else
    '' for i in xrange(len(text)))

>>> 'hey  say hi to '

Transforms the substring into whitespace

newText = ''.join(
    text[i] if text.rfind('@', 0, i+1) <= text.rfind(' ', 0, i+1) else
    ' ' for i in xrange(len(text)))

>>> 'hey      say hi to     '

Hope this helps, somehow!

Eithos
  • 2,421
  • 13
  • 13