You can access and scrape Twitter via advanced search without being logged in:
GET request
When performing a basic search request you get:
https://twitter.com/search?q=Babylon%205&src=typd
- q (our query encoded)
- src (assumed to be the source of the query, i.e. typed)
by default, Twitter returns top 25 results, but if you click on
all
you can get the realtime tweets:
https://twitter.com/search?f=realtime&q=Babylon%205&src=typd
JSON contents
More Tweets are loaded on the page via AJAX:
https://twitter.com/i/search/timeline?f=realtime&q=Babylon%205&src=typd&include_available_features=1&include_entities=1&last_note_ts=85&max_position=TWEET-553069642609344512-553159310448918528-BD1UO2FFu9QAAAAAAAAETAAAAAcAAAASAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Use max_position
to request the next tweets
The following json array returns all you need to scrape the contents:
https://twitter.com/i/search/timeline?f=realtime&q=Babylon%205&src=typd
- has_more_items (bool)
- items_html (html)
- max_position (key)
- refresh_cursor (key)
DOM elements
Here comes a list of DOM elements
you can use to extract
The authors twitter handle
div.original-tweet[data-tweet-id]
The name of the author
div.original-tweet[data-name]
The user ID of the author
div.original-tweet[data-user-id]
Timestamp of the post
span._timestamp[data-time]
Timestamp of the post in ms
span._timestamp[data-time-ms]
Text of Tweet
p.tweet-text
Number of Retweets
span.ProfileTweet-action–retweet > span.ProfileTweet-actionCount[data-tweet-stat-count]
Number of Favo
span.ProfileTweet-action–favorite > span.ProfileTweet-actionCount[data-tweet-stat-count]
Resources