-1
def clean_tweet(self, tweet):
     return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

What is the use of join() and split() function? What does the ' ' after return mean?

1 Answers1

1
(...)|(...)|(...)

Means first or second or third regex between parenthesis.

@[A-Za-z0-9]+

Matches "@" followed by any series of letters (small or big) and digits. '+' means longest possible match of this series, matching at least 1.

[^0-9A-Za-z \t]

A single digit, That mustn't be (^ inside []) one of: letter (big or mall), space or tab. One character only.

\w+:\/\/\S+

\w+ means longest possible match of letters (at least one), followed by ':', followed by // (you need to escape a /). Finally \S+ matches longest match and at least one of a non-space character.

kabanus
  • 24,623
  • 6
  • 41
  • 74