-1

This is my current code for scroll + grab links:

scroll_pause_time = 6 # You can set your own pause time. My laptop is a bit slow so I use 1 sec
screen_height = driver.execute_script("return window.screen.height;")   # get the screen height of the web
i = 1


while True:
    # scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    #driver.execute_script("window.scrollTo(0, {screen_height}*{i} + {extrascroll});".format(screen_height=screen_height, extrascroll=extrascroll, i=i))
    time.sleep(scroll_pause_time)
    links = driver.find_elements(By.CSS_SELECTOR, "a[class='styles__StyledLink-sc-l6elh8-0 ekTmzq Asset--anchor']")
    for link in links:
        file.write(link.get_attribute("href") + '\n')
    # Break the loop when the height we need to scroll to is larger than the total scroll height
    if (screen_height) * i > scroll_height:
        break

The main problems I have with this website: https://opensea.io/collection/embersword-land, is that listings unload as you go down the website, so I cant scroll to the bottom and then grab all the links.

The second problem is that HTML of listings the div that holds the listings (red) is dynamic and the height/listing amount changes randomly, so sometimes it will grab 40 or maybe 30, which causes me to grab duplicates.

The one fix i could think of was hiding the elements of all the listings I grabbed the links from which would move new listings to the top and then grab links again, but I don't know how to do that. Any help would be much appreciated! and if you need any more info let me know in the comments, still new to Stackoverflow and coding so learning as I go.

Tzeboys
  • 3
  • 2

1 Answers1

0

I suggest putting all the links to the list, removing duplicates, and then saving them to file.

scroll_pause_time = 6 # You can set your own pause time. My laptop is a bit slow so I use 1 sec
screen_height = driver.execute_script("return window.screen.height;")   # get the screen height of the web
i = 1

links_list = []

while True:
    # scroll one screen height each time
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    #driver.execute_script("window.scrollTo(0, {screen_height}*{i} + {extrascroll});".format(screen_height=screen_height, extrascroll=extrascroll, i=i))
    time.sleep(scroll_pause_time)
    links = driver.find_elements(By.CSS_SELECTOR, "a[class='styles__StyledLink-sc-l6elh8-0 ekTmzq Asset--anchor']")
    # put the links to list
    for link in links:
        links_list.append(link.get_attribute("href"))
    # Break the loop when the height we need to scroll to is larger than the total scroll height
    if (screen_height) * i > scroll_height:
        break



# this removes all duplications, but keeps the order
# based on https://stackoverflow.com/a/17016257/5226491
# python 3.7 required
links_list = list(dict.fromkeys(links_list))

# this also removes all duplications, but the order will be changed
#links_list = list(set(links_list))

for link in links_list:
    file.write(link + '\n')

Max Daroshchanka
  • 2,698
  • 2
  • 10
  • 14