0

I was wondering if there are ways that i can crawl twitter without using their API? I tried using their API and it was awesome. However i would like to ask if there is any alternative? As the crawler i am working on will be pass around, i do not wish for my token keys to be shared among them. Neither do i want everyone of them to go through the hassle of creating a Dev account so on and so fore.

The crawler i created with twitter API is capable of retrieving many many tweets. And the crawler i created without was only able to crawl around 10, as other tweets would be outside of the html.

I am using python 3.6

def spider(targetname, DOMAIN):
for item in g_data:
    try:
        name = item.find_all("strong", {"class": "fullname show-popup-with-id "})[0].text
        username = item.find_all("span", {"class": "username u-dir"})[0].text
        post = item.find_all("p", {"class": "TweetTextSize TweetTextSize--normal js-tweet-text tweet-text"})[0].text
        retweetby = item.find_all("a", {"href": "/"+targetname})[0].text
        subdatas = item.find_all('div', {'class':'ProfileTweet-actionCountList u-hiddenVisually'})
        for subdata in subdatas:
            replies = subdata.find_all("span", {"class": "ProfileTweet-actionCountForAria"})[0].text
            retweets = subdata.find_all("span", {"class": "ProfileTweet-actionCountForAria"})[1].text
            likes = subdata.find_all("span", {"class": "ProfileTweet-actionCountForAria"})[2].text
        datas = item.find_all('a', {'class':'tweet-timestamp js-permalink js-nav js-tooltip'})
        for data in datas:
            link = DOMAIN + data['href']
            date = data['title']
        if link in open(crawledfile).read():
            pass
        else:
            append_to_crawled(crawledfile, name, username, post, link, replies, retweets, likes, retweetby, date)
        output(name, username, post, link, replies, retweets, likes, retweetby, date)
    except:
        pass
NewbieCoder
  • 39
  • 10

1 Answers1

0

There is a way to crawl/ scrape twitter without using the twitter API; however, it is highly recommended that you use the API itself. This has several advantages, such as it being official, in addition to having a ton of support from the community.

Nevertheless, you can perform crawling using requests and beautiful soup, or if you're looking for a more powerful option, go for Selenium and PhantomJS.

Here are a couple of similar questions that you can read through:

Scraping of the Twitter follower page using selenium and phantomjs

How to collect tweets about an event that are posted on specific date using python?

How to perform oauth when doing twitter scraping with python requests

Infinite Web Scraping Twitter

Cheers :)

Sreetam Das
  • 3,226
  • 2
  • 22
  • 36