0

I am trying to scrape tweet text using tweet id and my code is:

import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser=webdriver.Chrome()
base_url='https://twitter.com/FoxNews/status/'
query='964981093127655424'
url=base_url+query
browser.get(url)
title=browser.find_element_by_tag_name('title')
print(title)

the output is:

selenium.webdriver.remote.webelement.WebElement(session="7ca1c0e4c33d62a122bc51bbc171c7eb", element="0.37665530454795326-1")

How can i print the text in human readable format? ( In this case: "On Twitter, former President @BillClinton called for a renewal of the Assault Weapons Ban".)

Jayanth
  • 329
  • 2
  • 5
  • 17
  • 2
    @Jayanth, you have already got two qualified answers, you should select either of them as your expected solution. It seems you are not comfortable at marking [answers](https://stackoverflow.com/users/7924106/jayanth). – SIM Feb 18 '18 at 11:15
  • 2
    You are neither doing yourself nor twitter a favor if you get tweets like this. have a look at https://developer.twitter.com/en/docs/tweets/post-and-engage/api-reference/get-statuses-show-id . Web scraping should only be done if there is no other possibility and twitter provides one so use it. – hansTheFranz Feb 18 '18 at 12:03

2 Answers2

3

Well you could use the .text attribute of the WebElement class.

I don't think that selenium is the best way to scrape a site, you better use some requests or urllib infused with beautifulsoup, using an actual browser is slow and less controllable (cookies, html attributes etc. etc.)

Ofek .T.
  • 741
  • 3
  • 10
  • 29
1


As Ofek pointed out, using a combination of requests/urllib and bs4 would be a better option for scraping.


In order to get the text you are interested in, you could do something like this:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://twitter.com/FoxNews/status/964981093127655424")
soup = BeautifulSoup(page.content, "html.parser")

tweet = soup.select_one(".js-tweet-text-container .TweetTextSize--jumbo")
print(tweet.get_text())


Your output would look like:

'On Twitter, former President @BillClinton called for a renewal of the Assault Weapons Ban.pic.twitter.com/hPaFyhGSfd'


Now, let's break down what we did. First, requests makes a GET request to the Twitter server, and the content of the input URL is saved as a Response object in page. We then proceed to create a BeautifulSoup object using page.

And to find the tweet text, we use CSS selectors.

This is a simple scraping job, and if any of it doesn't make sense to you, I suggest you go through some tutorials. You could start with this article, which will teach you the basics of web scraping and help you get started.

Hope this helps!

nisemonoxide
  • 441
  • 2
  • 7
  • Thanks @novice-coder for the suggested edits. I don't know why your suggestions were rejected. I have incorporated them into the answer. – nisemonoxide Feb 18 '18 at 10:56