1

I am relatively new to python. Suppose I have the following string -

tweet1= 'Check this out!! #ThrowbackTuesday I finally found this!!'
tweet2= 'Man the summer is hot... #RisingSun #SummerIsHere Can't take it..'

Now, I am trying to delete all hashtags(#) within the tweets such that -

tweet1= 'Check this out!!  I finally found this!!'
tweet2= 'Man the summer is hot...  Can't take it..'

My code was -

tweet1= 'Check this out!! #ThrowbackTuesday I finally found this!!'
i,j=0,0
s=tweet1
while i < len(tweet1):
    if tweet1[i]=='#':
        j=i
        while tweet1[j] != ' ':
            ++j
        while i<len(tweet1) and j<len(tweet1):
            ++j
            s[i]=tweet1[j]
            ++i
    ++i
print(s)

This code gives me no output and no errors which leads me to believe that I am using the wrong logic. Is there an easier solution to this using regex?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140

3 Answers3

3

You can utilize split and startswith to accomplish your task.

Here split will make your tweet string a list of words separated by spaces. So then when iterating in a comprehension creating a new list, just omit anything starting with a #, by using startswith. Then ' '.join will simply make it a string again separated by spaces.

The code can be written as

tweet = 'Check this out!! #ThrowbackTuesday I finally found this!!'
print(' '.join([w for w in tweet.split() if not w.startswith('#')]))

Output:

Check this out!! I finally found this!!
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
idjaw
  • 25,487
  • 7
  • 64
  • 83
  • I'd use if w[0]!='#', you can allow # in the middle of words. – SurDin Mar 25 '16 at 12:28
  • Or `if not w.startswith('#')` – zezollo Mar 25 '16 at 12:29
  • Thanks guys. I was thinking if tweets allow '#' in the middle. I was going to check twitter. Thanks for the heads up. :) – idjaw Mar 25 '16 at 12:30
  • ...and the pythonic `if not w.startswith("#")` is also an option – jDo Mar 25 '16 at 12:31
  • Thanks idjaw, using spilt is definitely the way to go. However, your expression does not account for the following cases-> 'dsfsdf#Throwback' and ' ?%*#Throwback' All is well and good if the hashtag begins with a space. –  Mar 25 '16 at 18:13
  • @IceFrog Yes. You are correct. I was actually discussing this with someone and this solution is for very simple cases. It seems like regex would be the way to go here (I was trying to think of a non-regex approach...but it was getting dirty and just...bad). Glad you accepted that answer as it seems to handle more of those (very common it seems) cases. – idjaw Mar 25 '16 at 18:18
3

Here is a regex solution:

re.sub(r'#\w+ ?', '', tweet1)

The regex means to delete a hash symbol followed by 1 or more word characters (letters, numbers, or underscore) optionally followed by a space (so you don't get two spaces in a row).

You can find out plenty about regexes in general and in Python with Google, it's not hard.

Additionally, to allow additional special characters, such as $ and @, replace \w with [\w$@], where the $@ can be substituted with whatever characters you like, i.e. everything in the brackets is allowed.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
  • Personally this was more of the solution I was looking for.Thanks Alex. Is there a way where I can account for the case wherein the hash symbol is followed by special characters as well? Eg, 'asdasdas #$%@asdasdasd' –  Mar 25 '16 at 18:20
  • 1
    You're welcome! I've edited the answer for additional characters. – Alex Hall Mar 25 '16 at 18:28
  • 1
    Edited to make it a better answer, do rollback if it's not required. Regards – Bhargav Rao Mar 28 '16 at 14:35
0

Python doesn't have a ++ operator so ++j just applies the + operator to j twice which, of course, does nothing. You should use j += 1 instead.

wRAR
  • 25,009
  • 4
  • 84
  • 97
  • Thank you for pointing that out. Is there a reason why the increment operator is not available in python? I have been coding with C++ and Java and both use the increment operator. –  Mar 25 '16 at 18:21
  • @IceFrog http://stackoverflow.com/questions/3654830/why-are-there-no-and-operators-in-python – wRAR Mar 25 '16 at 18:48