I was wondering if there are ways that i can crawl twitter without using their API? I tried using their API and it was awesome. However i would like to ask if there is any alternative? As the crawler i am working on will be pass around, i do not wish for my token keys to be shared among them. Neither do i want everyone of them to go through the hassle of creating a Dev account so on and so fore.
The crawler i created with twitter API is capable of retrieving many many tweets. And the crawler i created without was only able to crawl around 10, as other tweets would be outside of the html.
I am using python 3.6
def spider(targetname, DOMAIN):
for item in g_data:
try:
name = item.find_all("strong", {"class": "fullname show-popup-with-id "})[0].text
username = item.find_all("span", {"class": "username u-dir"})[0].text
post = item.find_all("p", {"class": "TweetTextSize TweetTextSize--normal js-tweet-text tweet-text"})[0].text
retweetby = item.find_all("a", {"href": "/"+targetname})[0].text
subdatas = item.find_all('div', {'class':'ProfileTweet-actionCountList u-hiddenVisually'})
for subdata in subdatas:
replies = subdata.find_all("span", {"class": "ProfileTweet-actionCountForAria"})[0].text
retweets = subdata.find_all("span", {"class": "ProfileTweet-actionCountForAria"})[1].text
likes = subdata.find_all("span", {"class": "ProfileTweet-actionCountForAria"})[2].text
datas = item.find_all('a', {'class':'tweet-timestamp js-permalink js-nav js-tooltip'})
for data in datas:
link = DOMAIN + data['href']
date = data['title']
if link in open(crawledfile).read():
pass
else:
append_to_crawled(crawledfile, name, username, post, link, replies, retweets, likes, retweetby, date)
output(name, username, post, link, replies, retweets, likes, retweetby, date)
except:
pass