how to get link of an image from twitter if the image is isnide a link

Question

I am trying to get the image links from Twitter posts. When I tried to do this with the Twitter API, I only get the link of images if they are posted as images but most newspapers post a link to their whole report including an image. I am trying to get the image inside that link. This is normally the basic way to get the link example. However, in my case, an image can be hidden under a link like this example from twitter. I have tried solutions like beautifulsoup but the HTML response I get doesn't make any sense. It doesn't include any of the attributes or classes I am searching for. This is the code I tried:

    url = "https://twitter.com/derspiegel/status/1511961740694614016"
    res = requests.get(url)
    
    soup = BeautifulSoup(res.content, "html.parser")
    print(soup)
    comicElem = soup.find_all(class_="css-9pa8cd")
    print(comicElem)

Does anyone have an idea to get this done with Twitter API or any other method?

Did you try with the twitter API or logging in? The scrambled HTML response is because Twitter renders the page using Javascript, you can test it by executing: `print(soup.find('p').text)` right after soup — Fabio Craig Wimmer Florey, Apr 11 '22 at 00:55
I get this "We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using twitter.com. You can see a list of supported browsers in our Help Center." How to fix this? — , Apr 11 '22 at 01:00
You really can't circumvent it with the requests module (https://github.com/gottfrois/link_thumbnailer/issues/141) but here's a solution (partial) because twitter is against unauthorized bots: https://stackoverflow.com/questions/71444292/how-to-fix-javascript-is-not-available-output-when-web-scraping-dynamic-sites-w as for your question, it's a duplicate now :) — Fabio Craig Wimmer Florey, Apr 11 '22 at 01:03
Thank you for your response. Do you know a way to do this with Twitter API? or it is impossible? — , Apr 11 '22 at 01:06
you may have the most common problem: page uses `JavaScript` to add/update elements but `BeautifulSoup`/`lxml`, `requests`/`urllib` can't run `JS`. You may need [Selenium](https://selenium-python.readthedocs.io/) to control real web browser which can run `JS`. OR use (manually) `DevTools` in `Firefox`/`Chrome` (tab `Network`) to see if `JavaScript` reads data from some URL. And try to use this URL with `requests`. `JS` usually gets `JSON` which can be easy converted to Python dictionary (without `BS`). You can also check if page has (free) `API` for programmers. — furas, Apr 11 '22 at 01:38

score -1 · Answer 1 · answered Apr 11 '22 at 02:33

Use selenium that's example cope tutorials website :

from selenium import webdriver
#set chromedriver.exe path
driver = webdriver.Chrome(executable_path="C:\\chromedriver.exe")
driver.implicitly_wait(0.5)
#maximize browser
driver.maximize_window()
#launch URL
driver.get("https://www.tutorialspoint.com/index.htm");
#open file in write and binary mode
with open('Logo.png', 'wb') as file:
#identify image to be captured
   l = driver.find_element_by_xpath('//*[@alt="Tutorialspoint"]')
#write file
   file.write(l.screenshot_as_png)
#close browser
driver.quit()

Copying image by id Link to tutorial

how to get link of an image from twitter if the image is isnide a link

1 Answers1