3

I'm currently experimenting with the Twitter Streaming API. Everything work's like a charm, but the API sends me ton's of data, which I don't need. Is there a possibility to filter the data the API send me?

I'm using the following stream: https://stream.twitter.com/1.1/statuses/filter.json

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Lukas
  • 1,346
  • 7
  • 24
  • 49

4 Answers4

6

Take a look at the filter stream of the api:

https://dev.twitter.com/docs/api/1.1/post/statuses/filter

You can enter a set of keywords as a filter to track twitter, according to current limitations you can track up to 400 keywords.

After retrieving the tweets you have to make a manual filtering again to remove noisy data.

So if you can specify what you are looking by a set of keywords, you will achieve what you want; but there will always be noise in your data because it is almost impossible to define smtg that precisely through simple keyword filtering.

For example lets assume you wanna track all tweets related to a brand named XYZ. For getting tweets about brand XYZ you might have a one word keyword set which contains only "XYZ". API will give all the tweets containing XYZ to you, but assume that "XYZ" has a meaning in some language and people of speaking that language will tweet about that word and you will receive that too. Also assume there is a city called XYZ and people will send check-in mesasgees. So at that point you need to filter out tweets that are not related to your topic, either by language detection or contextual information retrieval. But the key is to specify your keyword set about the topic you wanna cover.

Cheers.

cubbuk
  • 7,800
  • 4
  • 35
  • 62
  • Hi, thank's for that, but the problem is that i don't even want to receive the "Noisy" data, as i want to process lot's of tweets in less time :) Maybe it isn't even podsible to get a "short" version of the tweets from the api. – Lukas Jan 28 '13 at 22:34
  • @LucèBrùlè I edited my answer to clarify whats the noise data. – cubbuk Jan 28 '13 at 23:20
  • @cubbuk : Suppose i specified 3 keywords in the filter. Now when i get data from streaming API, is there a way (other than manually searching on my own) to detect that the tweet corresponds to WHICH of the three keywords that i specified in the filter ? – user1599964 May 16 '13 at 21:13
  • @user1599964 as far as I know, twitter doesn't provide any info about that, you have to figure it out manually yourself. – cubbuk May 17 '13 at 07:30
  • @cubbuk : Yes, i figured that out. Can you have a look at this question and let me know your views: http://stackoverflow.com/questions/16602483/filtering-of-tweets-received-from-statuses-filter-streaming-api – user1599964 May 17 '13 at 07:34
  • is there any tool which can help me for language detection? – S Gaber Aug 03 '13 at 02:31
  • @cubbuk : will the streaming API also include tweets like abXYZcd or XYZmn. Does it give me tweets which contain the filter substring?. For example if I filter for "fast", will it give me tweets like "breakfast"?. – Krishna Kalyan Mar 16 '16 at 00:06
  • @KrishnaKalyan I just don't know the current status, sorry. – cubbuk Mar 16 '16 at 12:18
0

answer is "No" for the question "is there a way (other than manually searching on my own) to detect that the tweet corresponds to WHICH of the three keywords that i specified in the filter ?" We have to do it manually ..

0

If it is possible you can migrate to twitter APIv2, since, by default it only returns id and text parameters, this reduces incoming data dramatically. Then you can add fields parameter for extra data you want.

Shanazar
  • 54
  • 6
-1

Look at backtype storm project. there are examples to filter api using twitter4j.

Dmitry
  • 89
  • 3