How do I get the link inside href?

Question

I am building a bot, and going to get the href part out, which is /VegSpringRoll/status/1205121838302420993, from the html of twitter.com below,

<a class="css-4rbku5 css-18t94o4 css-901oao r-1re7ezh r-1loqt21 r-1q142lx r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-3s2u2q r-qvutc0" title="9:46 PM · Dec 12, 2019" href="/VegSpringRoll/status/1205121838302420993" dir="auto" aria-label="Dec 12" role="link" data-focusable="true"</a>

my script is:

class TwitterBot:
def __init__(self, username, password):
    self.username = username
    self.password = password
    self.bot = webdriver.Firefox()


def login(self):
    bot = self.bot
    bot.get('https://twitter.com/login')
    time.sleep(1)
    email = bot.find_element_by_class_name('js-username-field.email-input.js-initial-focus')
    password = bot.find_element_by_class_name('js-password-field')
    email.clear()
    password.clear()
    email.send_keys(self.username)
    password.send_keys(self.password)
    password.send_keys(Keys.RETURN)
    time.sleep()

def like_tweet(self,hashtag):
    bot = self.bot
    bot.get('https://twitter.com/search?q=%23' + hashtag + '&src=type')
    time.sleep(1)
    for i in range(1,10):
        bot.execute_script('window.scrollTo(0,document.body.scrollHeight)')# this scroll 1 time only.
        time.sleep(1)

        tweets = bot.find_elements_by_class_name('css-4rbku5 css-18t94o4 css-901oao r-1re7ezh r-1loqt21 r-1q142lx r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-3s2u2q r-qvutc0')
        links = [elem.get_attribute('href') for elem in tweets]
        print(links)

everything works until the tweets part.

but nothing get printed. would anybody please assist?

Does `tweet` conatins anything?, and what is this `bot`? what object is it? — Reznik, Dec 14 '19 at 10:13
first check what you get in `tweets`. Some pages may use different random classes everytime when you (re)load page. — furas, Dec 14 '19 at 10:16
hello @Reznik and furas, i have updated the OP, and pasted the entire script there. DebanjanB, it is copied directly from Twitter. — yts61, Dec 14 '19 at 14:56

score 2 · Accepted Answer · answered Dec 14 '19 at 13:15

2

Selenium compound class names are not permitted and you have to use css selector or xpath. Following code should work

tweets = bot.find_elements_by_css_selector('.css-4rbku5.css-18t94o4.css-901oao.r-1re7ezh.r-1loqt21.r-1q142lx.r-1qd0xha.r-a023e6.r-16dba41.r-ad9z0x.r-bcqeeo.r-3s2u2q.r-qvutc0')
links = [elem.get_attribute('href') for elem in tweets]
print(links)

Please read this discussion to get more info.

answered Dec 14 '19 at 13:15

Naeem

162
3
13

thank you @Naeem, would you pls elaborate more? I googled it, but still have no idea. though i found answers like this, https://stackoverflow.com/questions/33155454/how-to-find-an-element-by-href-value-using-selenium-python/33155512 like how to apply it in my situation. – yts61 Dec 14 '19 at 15:03
i rewrote it: tweets = bot.find_element_by_xpath('//*[@class="css-4rbku5 css-18t94o4 css-901oao r-1re7ezh r-1loqt21 r-1q142lx r-1qd0xha r-a023e6 r-16dba41 r-ad9z0x r-bcqeeo r-3s2u2q r-qvutc0"]'), it returns Error 'FirefoxWebElement' object is not iterable – yts61 Dec 14 '19 at 15:37
@yts61 I have provided you code with css selector. Have you tried that? – Naeem Dec 15 '19 at 05:57
unfortunately it doesn't work, but never mind, i found the solution! – yts61 Dec 15 '19 at 06:32

How do I get the link inside href?

1 Answers1