There is a list string twitter text data, for example, the following data (actually, there is a large number of text,not just these data), I want to extract the all the user name after @ and url link in the twitter text, for example: galaxy5univ and url link.
tweet_text = ['@galaxy5univ I like you',
'RT @BestOfGalaxies: Let's sit under the stars ...',
'@jonghyun__bot .........((thanks)',
'RT @yosizo: thanks.ddddd <https://yahoo.com>',
'RT @LDH_3_yui: #fam, ccccc https://msn.news.com']
my code:
import re
pu = re.compile(r'http\S+')
pn = re.compile(r'@(\S+)')
for row in twitter_text:
text = pu.findall(row)
name = (pn.findall(row))
print("url: ", text)
print("name: ", name)
Through testing the code in a large number of twitter data, I have got that my two patterns for url and name both are wrong(although in a few twitter text data is right). Do you guys have some documents or link about extract name and url from twitter text in the case of large twitter data.
If you have advices about extracting name and url from twitter data, please tell me, thanks!