I'm trying to get all the links connected to each image in this webpage.
I can get all the links if I let a selenium script scroll downward until it reaches the bottom. One such link that I wish to scrape is this one.
Now, my goal here is to parse all those links using requests. What I have notice that the links that I want to parse are built using such B-uPwZsJtnB
shortcode.
However,
I'm trying to scrape those different shortcode
available in a script tag found in page source in that webpage. There are around 600 shortcodes
in that page. The script that I've created can parse only the first 70
such shortcode
which ultimately can built 70 qualified links.
How can I grab all 600 links using requests?
I've tried so far with:
import re
import json
import requests
base_link = 'https://www.instagram.com/p/{}/'
lead_url = 'https://www.instagram.com/explore/tags/baltimorepizza/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'
req = s.get(lead_url)
script_tag = re.findall(r"window\._sharedData[^{]+(.*?);",req.text)[0]
for item in json.loads(script_tag)['entry_data']['TagPage']:
tag_items = item['graphql']['hashtag']['edge_hashtag_to_media']['edges']
for elem in tag_items:
profile_link = base_link.format(elem['node']['shortcode'])
print(profile_link)