19

For a project, I want to be able to create a dataset of tweets containing some particular string of symbols. Since I would also like to go as far back in time as possible, I tried using the GetOldTweets script ( https://github.com/Jefferson-Henrique/GetOldTweets-python ) mentioned here: https://stackoverflow.com/a/35077920/5858873 .

The issue is, it isn't able to extract tweets containing symbols as input. In fact, one cannot even search directly on Twitter for any tweets consisting of required symbols.

To more clearly explain the problem, consider the following sample case. I would l like to extract all tweets containing the string '!!!' within the last two years.

What is the best way to do this (if this even is doable)?

Melsauce
  • 2,535
  • 2
  • 19
  • 39
  • 1
    One way is to grab the data (tweets) and manually parse them for your symbols (which will be slow but will get the job done). Another is to look up Twitter's API and see if it support a search function. A quick google yields [this](https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets). – MooingRawr Nov 23 '17 at 14:31
  • @MooingRawr in which case, however, I would have to first extract ALL tweets (which is impossible). Also, I don't think the Twitter API allows searching for tweets that only contain symbols. – Melsauce Nov 23 '17 at 14:43
  • @Melsauce I think it does. Or at least it did two years ago, the library I used to use was called 'tweepy'. – Arne Nov 23 '17 at 14:44
  • 2
    If you want to go official [it looks like it will cost you](https://developer.twitter.com/en/docs/tweets/search/guides/premium-operators) I would be VERY surprise if Twitter didn't support symbol search while they allow normal search... – MooingRawr Nov 23 '17 at 14:45
  • @ArneRecknagel The problem with Tweepy is that it only allows you to extract tweets to upto a week ago. – Melsauce Nov 23 '17 at 14:45
  • @MooingRawr I don't think the proprietary version that allows you to search for only symbol queries either. :/ – Melsauce Nov 23 '17 at 14:48
  • `To match strings containing punctuation (e.g. coca-cola), symbol, or separator characters, you must use a quoted exact match as described below.` From the last link I listed. Comment section isn't meant for this back and fourth so I will yield until a few days has past. Good luck. – MooingRawr Nov 23 '17 at 14:53
  • 1
    I feel this is important because it will help future posters (these are all relevant clarifications!). What you quoted works if the phrase *contains symbols*, but does not work for a string comprised *entirely of symbols*. So, for example, the "Coca-cola!" query would yield results, but not "!!!". – Melsauce Nov 23 '17 at 15:00
  • Could a sequence of queries be made, going through an enumeration of characters possibly preceding the desired symbols? ` !!!`, `a!!!`, `b!!!`, `c!!!`... – Reblochon Masque Nov 24 '17 at 06:32
  • Could you say further @ReblochonMasque, I'm not sure I understand. From what I can make out, you mean working around the problem by including characters before the desired sequence. The problem is, Twitter entirely IGNORES the symbols. For example, try searching "coke!!". It will return tweets containing "coke" ignoring the "!!" – Melsauce Nov 24 '17 at 06:43
  • Yes, that is what I meant. – Reblochon Masque Nov 24 '17 at 06:44
  • Unfortunately nothing seems to work. – Melsauce Nov 24 '17 at 06:53
  • is it possible to search using unicode? – johnashu Nov 26 '17 at 03:26
  • Could you provide me with an example? I was informed search using unicode simply ignores the symbols altogether. – Melsauce Nov 26 '17 at 18:06
  • The answers [here](https://stackoverflow.com/questions/15589533/searching-for-tweets-with-unicode-character-apple-emoji) indicate that you can't use the standard Twitter search to search for arbitrary Unicode symbols, but you can use the Streaming Search. So that may also be the solution to your `!!!` symbol search problem. – PM 2Ring Nov 26 '17 at 18:19
  • @PM2Ring I'll try this out. On a different note, would I be able to get tweets as far back as I'd like, though? – Melsauce Nov 26 '17 at 18:23
  • Sorry, I have no idea. I don't use Twitter, and I don't know anything about its API. – PM 2Ring Nov 26 '17 at 18:24
  • 1
    No problem. As far as I know, the issue with the Streaming API is that you get access to only the most recent tweets. – Melsauce Nov 26 '17 at 18:26

3 Answers3

1

You can create your own regular expression on the basis of your requirement and
then hit the twitter data to extract the specific tweets.

krits
  • 68
  • 1
  • 9
  • Could you talk more about "hitting the twitter data"? Are you talking about their API, some particular scraper etc? Also, I would be grateful if you could provide screenshots of how this worked for you. – Melsauce Nov 30 '17 at 02:46
  • Yes, check out the link for the API description https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/ – krits Nov 30 '17 at 08:32
  • That does not seem to work for me. Can you run the code searching for "!!!" and post a screenshot if that worked for you? – Melsauce Dec 02 '17 at 09:03
1

I found this interesting ressource : https://webapps.stackexchange.com/questions/92196/search-for-tweets-with-special-characters

It basically says that certain characters cannot be searched because Twitter has blocked their use.

I believe what you should do is search through all the tweets within the range of a certain scope, and then use the string method find on the body message of each tweet. You would then stop when you would have reached a certain run-time or a specific amount of tweets found.

IMCoins
  • 3,149
  • 1
  • 10
  • 25
  • As stated in the example in the question, I want to find all tweets containing the required string within a time frame of 2 years. That makes extracting ALL tweets infeasible because of the sheer number, even if there were a way to extract all tweets for that duration. – Melsauce Nov 30 '17 at 02:49
  • You were wondering `(if this even is doable)`. And I don't think it is, and I provided some sources for it, along with another solution. – IMCoins Nov 30 '17 at 09:39
0

You can download and store data from Twitter API using various criteria (search for words in a dictionary, location search, popular Twitter accounts etc) It won't be the whole data for sure but you will have some part of it.

Then search these tweets locally.

These characters are also valid in url's so strip out the url's before searching.

Also don't forget to check whether storing data you got from Twitter is legal.

alpere
  • 1,079
  • 17
  • 26
  • Problem is, I have no way of knowing ALL or the search terms (or even the most common ones) that co-occur with my required string. Let's say I download all the tweets for the search term "apples" and then search locally among these tweets for "!!!". Sure, I'll get all tweets that have "apples" and "!!!", but what about all the other tweets on twitter that do not have "apples"? A partial solution is not feasible in this case because there is no frequent itemset for my search query. – Melsauce Dec 02 '17 at 09:08