1

I am new to python and managed to write a little program (using python3) to retrieve information from a website. I have two problems:

  1. I do not know how to tell python to wait each 80th step, so when i = 80, 160, 240 etc.
  2. I do not know how to tell python to retrieve the information from the website how many steps exist in total (as this varies from page to page), see image below. I can see in the picture that the maximum amount of 260 is "hard-coded" in this example? Can I tell python to retrieve the 260 by itself (or any other number if this changes on another web page)?
  3. How can I tell python to check which is the current page the script starts, so that it can adjust i to the page`s number? Normally I presume to start at page 0 (i = 0), but for example, if I were to start at page 30, my script shall be able to make i = 30 or if I start at 200, it shall be able to adjust i = 200 etc before it goes to the while loop.

Is it clear what I am troubling with?

enter image description here

This is the pseudo code:

import time
from selenium import webdriver

url = input('Please, enter url: ')

driver = webdriver.Firefox()
driver.get(url)

i = 0

while i > 260: # how to determine (book 1 = 260 / book 2 = 500)?
    # do something
    if i == 80: # each 80th page?
        # pause
    else:
    # do something else
    i = i + 1
else:
    quit()
Til Hund
  • 1,543
  • 5
  • 21
  • 37
  • 1
    Can you explain your 3th question ? – Or Duan Apr 22 '17 at 07:11
  • I edited my 3rd question. I hope I have been more clear now. Sometimes it is hard to explain what one wants. ;) I have right now few time on my side, but I will check your answers later today. Thank you for answering you all! – Til Hund Apr 22 '17 at 09:56

2 Answers2

1

1) sleep

import time
....     
    if i % 80 == 0: # each 80th page?
        # Wait for 5 seconds
        time.sleep(5)

2) element selectors

html = driver.find_element_by_css_selector('afterInput').get_attribute('innerHTML')

3) arguments

import sys
....
currentPage = sys.argv[2]

or extract it from the source (see 2)

user3804188
  • 130
  • 6
  • Thank you very much for answering, user3804188. Unfortunately, the element selector does not function, it gives me the following error message `selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: afterInput`. – Til Hund Apr 22 '17 at 19:58
  • 1
    This means that your element you are looking for ist not there. Check your the html souce code (driver.page_source). – user3804188 Apr 23 '17 at 06:22
1

First, if you want to know if your i is "step"(devision) of 80 you can use the modulo sign, and check if it equal to 0, for instance:

if i % 80 == 0:
    time.sleep(1) # One second

Second, you need to query the html you receive from the server, for instance:

from selenium import webdriver

url = input('Please, enter url: ')

driver = webdriver.Firefox()
driver.get(url)
total_pages = driver.find_element_by_css_selector('afterInput').get_attribute('innerHTML').split()[1]  # Take only the number

after your edit: All you have to do is to is to assign i with this value you want by defining a variable in your script/parsing the arguments from the command line/scrape it from the website. This is Depends on your implementation and needs.

Other notes

I know you're on your beginning steps, but if you want to improve your code and make it a bit more pythonic I would do the following changes:

  • Using while and i = i + 1 is not a common pattern in python, instead use for i in range(total_pages) - of course you need to know the number of pages (from your second question)
  • There is no need to call quit(), your script will end anyway in the end of the file.
    • I think you meant while i < 260.
Or Duan
  • 13,142
  • 6
  • 60
  • 65
  • Thank you very much for your answer, Or Duan. What do I write instead of `quit()`? Nothing? – Til Hund Apr 22 '17 at 10:50
  • 1
    That right, there is no point to call it in the end, the script will be ended anyway. – Or Duan Apr 22 '17 at 13:46
  • Is `while i < 260` the same as `for i in range(total_pages)`? Unfortunately, the `total_pages` does not function. :/ It tells me `selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: afterInput`. – Til Hund Apr 22 '17 at 19:57
  • Do I understand it correct that `if i % 80 == 0:` does do something each 80th step? – Til Hund Apr 22 '17 at 20:03
  • With regards to the error message, I read [this post on stackoverflow](http://stackoverflow.com/questions/27112731/selenium-common-exceptions-nosuchelementexception-message-unable-to-locate-ele) and tried the check marked solution, but the same error message appears. – Til Hund Apr 22 '17 at 20:11
  • 1
    The 2 loops in your case are the same, the later is more "pythonic" and the total_page is just a number(let's say 260). Your error is a different one, you might read [this](http://stackoverflow.com/questions/25850842/finding-element-with-explicit-wait-using-selenium-webdriver-in-python) to understand how you wait for element to be present. For `i % 80` please read [this](http://stackoverflow.com/questions/4432208/how-does-work-in-python). – Or Duan Apr 24 '17 at 04:55