def clean_tweet(self, tweet):
return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
What is the use of join() and split() function? What does the ' ' after return mean?
def clean_tweet(self, tweet):
return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
What is the use of join() and split() function? What does the ' ' after return mean?
(...)|(...)|(...)
Means first or second or third regex between parenthesis.
@[A-Za-z0-9]+
Matches "@" followed by any series of letters (small or big) and digits. '+' means longest possible match of this series, matching at least 1.
[^0-9A-Za-z \t]
A single digit, That mustn't be (^
inside []
) one of: letter (big or mall), space or tab. One character only.
\w+:\/\/\S+
\w+
means longest possible match of letters (at least one), followed by ':', followed by // (you need to escape a /). Finally \S
+ matches longest match and at least one of a non-space character.