39

I've been poring over the Twitter docs for some time now, and I've hit a wall how to get stats for growth of followers over a period of time / count of tweets over a period of time...

I want to understand from the community what does since_id and max_id and count mean in the Twitter API.

I've been following this page https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline

I'm trying to get stats for a user --

  • counts of tweets in a particular time period
  • count of followers over a particular time period
  • count of retweets

I'd like some help forming querystrings for the above..

Thanks..

Serge S.
  • 4,855
  • 3
  • 42
  • 46
Hrishikesh Choudhari
  • 11,617
  • 18
  • 61
  • 74

3 Answers3

40

since_id and max_id are both very simple parameters you can use to limit what you get back from the API. From the docs:

since_id - Returns results with an ID greater than (that is, more recent than) the specified ID. There are limits to the number of Tweets which can be accessed through the API. If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available. max_id - Returns results with an ID less than (that is, older than) or equal to the specified ID.

So, if you have a given tweet ID, you can search for older or newer tweets by using these two parameters.

count is even simpler -- it specifies a maximum number of tweets you want to get back, up to 200.

Unfortunately the API will not give you back exactly what you want -- you cannot specify a date/time when querying user_timeline -- although you can specify one when using the search API. Anyway, if you need to use user_timeline, then you will need to poll the API, gathering up tweets, figuring out if they match the parameters you desire, and then calculating your stats accordingly.

Serge S.
  • 4,855
  • 3
  • 42
  • 46
muffinista
  • 6,676
  • 2
  • 30
  • 23
  • 1
    Thanks for this answer. If I want to convert a date to the corresponding `since_id` for a `user_timeline` query, is your suggestion to use the `search` API function first to determine an correct id to use for a given date? – cboettig Sep 29 '12 at 02:50
  • That's certainly one way to do it, and I can't think of another way to do it offhand. – muffinista Oct 01 '12 at 12:28
  • @muffinista : How to know if we have reached the oldest possible/allowable value of max_id ? I mean suppose i set count = 100, and then I am each time fetching tweets and setting the max_id to the last id of the tweet recieved last time. In this scenario, how will i get to know when i have reached the limit ? – user1599964 May 25 '13 at 12:58
  • 1
    @user1599964 if you were to do that, at some point you would get back less than 100 results, and at that point you've presumably reached the end of the tweets. – muffinista May 28 '13 at 23:57
  • @muffinista: I am trying to use both max_id & since_id in the same call to search for a query. I set the max_id to the oldest tweet id (last id received) & since id to the newest tweet id (first id received) in the same call but I am always getting this error '[{u'message': u'Missing or invalid url parameter.', u'code': 195}'. any ideas? – Daisy Oct 20 '14 at 22:23
  • @Daisy make sure that max_id is greater than since_id -- see https://github.com/tweepy/tweepy/issues/375 for details I think – muffinista Oct 21 '14 at 12:04
  • @muffinista: the problem is that it is not greater. I am searching twitter using a query. then I am saving each tweet with its id in a dictionary. After that, if I want to search with the same query again, I specify the max_id to be the min of the saved ids (the oldest tweet) while since_id is the max (the newest tweet). In order to get newer tweets than the first one i got. It seems that this technique could not be in one call. Or is there something that I misunderstood? – Daisy Oct 21 '14 at 14:46
  • @Daisy in that case, unless I am confused, your parameters are backwards. If you imagine that you want to collect tweets from #1 to #1000 you would start with the since_id=1 and the max_id=1000. if you need to iterate, you might do since_id=1,max_id=100, then since_id=100,max_id=200, and so on. The max_id is the newest tweet you want to collect, while the since_id is the oldest tweet you want. Does that make sense? – muffinista Oct 21 '14 at 18:54
  • @muffinista: thanks a lot for your help. That's right, I understand your point about the 1000 tweets. But Twitter API does not allow you to get the 1000 tweets at once. For that I made iterations using only the max_id to get old tweets. After a while (you could not go further than a week old) I want to check if any new tweets are there so I use the since_id. So I was wondering if I could do that check in one call? Another thing, as far as I understood max_id is the old tweets id & since_id is the newest, right?! – Daisy Oct 21 '14 at 19:08
  • @Daisy, if you are only getting new tweets, you might not need max_id at all. If you know the id of the last tweet you got in the search, you could pass that as the since_id, and the API will return up to 1000 tweets. I think reading about since_id here https://dev.twitter.com/rest/public/timelines might be helpful. – muffinista Oct 22 '14 at 11:41
  • Anyone have some snippet link where since_id max_id are use for pagination? Java preferably. Read the.docs, still confused – Dr4ke the b4dass Jul 20 '18 at 20:17
13

The max_id = top of tweets id list . since_id = bottom of tweets id list .

for more : get a deep look in the last diagram .. here

Rahul
  • 44,892
  • 25
  • 73
  • 103
aliassiri
  • 131
  • 1
  • 2
  • when do you have to update the since_id value to get up to date data ? – Petar Sep 14 '15 at 15:39
  • @pe60t0 I think there are two cases here. The first case is when your last executed request doesn't return any data. That means you reached the beginning ("bottom") of the timeline for your search query. The second case will widely vary depending on your service/app logic: a) you can restrict the depth of your overall search (e.g. by date); OR b) if you need to quickly react to appearance of new tweets you can start your search from the beginning based on some sort of a timer (e.g. reset the query's since_id to a new value each 5 minutes). – Igor Soloydenko Sep 27 '15 at 08:34
3

The max_id and since_id are used to prevent redundancy in the case of Twitter API calls. Visualize the tweets coming in as piling onto a stack. One API call has to specify how many (count) tweets will be processed. But as this call is made, new tweets may be added. In that case, if you draw out a stack and run through the process, you notice that there can be some 'fragmentation' or sections of unprocessed tweets stuck in between processed ones. This is visible in below image as well.

enter image description here

To get around this problem, two parameters are used to keep track of the latest/greatest ID tweet previously processed (since_id) and the oldest/lowest ID tweet recently processed (max_id). The since_id points to the bottom of the 'fragment' and the (max_id-1) points to the top of the 'fragment'. (Note that the max_id is inclusive unlike the since_id) So, the parameters together keep track of which part of the tweet stack still needs to be processed.

Mahesh Jamdade
  • 17,235
  • 8
  • 110
  • 131
narcissus789
  • 166
  • 12