3

Can anyone tell me why the code below won't return an emoji attribute...

from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re    

# open webpage and allow time to load entirely
driver = Chrome()
driver.implicitly_wait(15)
driver.get("https://twitter.com")
time.sleep(2)

# start scraping tweets
tickerOptDetails = []
tweet_ids = set()
tweet_ids.clear()
print(tweet_ids)


def main():

    # prevent computer from going to sleep
    pyautogui.press('shift')

    print("--checking for new alert...")
    page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')

    for card in page_cards:
        try:
            ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
            optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
                                                     '/../following-sibling::span').text.split('\n')[0]\
                .replace('-', '').replace('$', '')
            emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg')"
                                               " or contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
                .get_attribute("title")
            

            tradeCriteria = str(ticker+optCriteria)
        except NoSuchElementException:
            continue

       if tradeCriteria:
            tweet_id = ' '.join(tradeCriteria)
            if tweet_id not in tweet_ids:
                 tweet_ids.add(tweet_id)
                 if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):

                      print(tradeCriteria)
                      print(emoji)

main()

But then the following code will return an emoji attribute...

from selenium.webdriver import Chrome
import time
from selenium.common.exceptions import NoSuchElementException
import re


# open webpage and allow time to load entirely
driver = Chrome()
driver.get("https://twitter.com")
time.sleep(2)

# start scraping tweets
tickerOptDetails = []
emojiSet = []
tweet_ids = set()
last_position = driver.execute_script("return window.pageYOffset;")
scrolling = True
tweet_ids.clear()
print(tweet_ids)
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')

while scrolling:
    page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
    for card in page_cards:
        try:
            ticker = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]').text.replace('$', '')
            optCriteria = card.find_element_by_xpath('//span/a[starts-with(text(),"$")]'
                                                     '/../following-sibling::span').text.split('\n')[0].replace('-', '').replace('$', '')
            emoji = card.find_element_by_xpath("//img[contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f402.svg') or"
                                               " contains(@src,'https://abs-0.twimg.com/emoji/v2/svg/1f43b.svg')]")\
                .get_attribute("title")
            
            tradeCriteria = str(ticker+optCriteria)
        except NoSuchElementException:
            continue

        if tradeCriteria:
            tweet_id = ''.join(tradeCriteria)
            if tweet_id not in tweet_ids:
                tweet_ids.add(tweet_id)
                if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):

                    print(tradeCriteria)
                    print(emoji)

    scroll_attempt = 0
    while True:
        # check scroll position
        driver.execute_script('window.scrollTo(0, document.body.scrollHeight);')
        time.sleep(2)
        curr_position = driver.execute_script("return window.pageYOffset;")
        if last_position == curr_position:
            scroll_attempt += 1

            if scroll_attempt >= 3:
                scrolling = False
                break
            else:
                time.sleep(2)
        else:
            last_position = curr_position
            break

print(tweet_ids)

I know I've added the scrolling to the second code, so it's looking at the entire page and returning the elements I'm looking for. But other than that they're more or less the same. I could run the first code every few seconds and it will never find the emoji element. It will find the ticker and optCriteria no problem and print them together as the tradeCriteria, but it will never find the emoji attribute even if it's there.

I tried both implicitly wait and explicitly wait, but neither one worked. I also tried having the emoji xpath line in the if statement if 13 < len(tradeCriteria) < 22 and re.search(r'\d{8} \D ', tradeCriteria):, but that didn't work either.

S. Price
  • 79
  • 3
  • Are you aware that line 42 of the first file has a syntax error? I don't imagine this is the cause as the python interpreter would fail with that error early on. – W-B Aug 19 '22 at 07:34
  • I also believe that on line 22 you mean to use `pyautogui.keydown` rather than `pyautogui.press` the difference is one is only pressed once while the other holds down the key which gives you the desired effect of keeping the screen awake. – W-B Aug 19 '22 at 07:38
  • the first code contains a code-breaking indent error? can you paste the code as base64 as well? ```File "/home/hans/lol.py", line 42 if tradeCriteria: ^ IndentationError: unindent does not match any outer indentation level ``` – hanshenrik Aug 25 '22 at 13:37
  • your code seems to be incompatible with Selenium >=4.3.0, see https://stackoverflow.com/a/72754667/1067003 – hanshenrik Aug 25 '22 at 14:02

1 Answers1

2

After plugging your code into a comparison checker, it seems there is a space missing between line 38 and 43 respectively.

43: tweet_id = ' '.join(tradeCriteria)
38: tweet_id = ''.join(tradeCriteria)

This space is causing there to be a space between each element in the tradeCriteria list when joined.

43: a b c

38: abc

Seeing how the print(emoji) statement is after if tweet_id not in tweet_ids: in both files, I think this difference is what is causing the problem in the first file.

Alternatively, if you are scraping data from twitter, you can try using the official Twitter API with a python wrapper such as Tweepy as it is slightly easier. You can learn more about how to do that here.