I'm taking a look at a twitter dataset and I encountered a problem when trying to remove the mentions from the tweets that have them. I tried the following:
echo ' "@user lol I needed it! went to sleep around 3am and woke up around 5 am! lol horrible! "' | \
sed 's/@.*[[:blank:]]//g'
My expected output is "lol I needed it! went to sleep around 3am and woke up around 5 am! lol horrible! "
, however I'm simply getting 2 quotations marks ""
. I find this really weird as the following dummy example works (outputs "zzz" "dfg"
):
echo '"zzz" "@abc dfg"' | sed 's/@.*[[:blank:]]//g'
I'm using GNU sed and the database I'm looking at can be downloaded here: http://help.sentiment140.com/for-students/. Any ideas of why this might be failing?