I am trying to extract the hashtags in a tweet. All of the tweets are in one column in a csv file. Although, there are resources on parsing strings and putting the extracted hashtags into a list, I haven't come across a solution on how to parse tweets already stored in list or dictionary. Here is my code:
with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
tweet = line[1:2] #This is the column that contains the tweets
for x in tweet:
match = re.findall(r"#(\w+)", x)
if match: print x
I predictably get 'TypeError: expected string or buffer', because it's true, 'tweet' in this case is not a string- it is a list.
Here is where my research has taken me thus far:
Parsing a tweet to extract hashtags into an array in Python
http://www.tutorialspoint.com/python/python_reg_expressions.htm
So I'm iterating through the match list and I'm still getting the whole tweet and not the hashtagged item. I was able to strip the hashtag away but I want to strip everything but the hashtag.
with open('hash.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for line in reader:
tweet = line[1:2]
print tweet
for x in tweet:
match = re.split(r"#(\w+)", x)
hashtags = [i for i in tweet if match]