0

Short summary of my problem I am trying to use a 'for n in range' loop to repeat a set of instructions, but I cannot get functions in the loop to adopt the value of n.

Background I am trying to scrape news items from an intranet site. I am tasked with collecting the titles, URLs and publication dates of many news items (item per page).

The problem (with code) I got it all working (open browser, login, accept cookies, scraping all needed from a page, clicking to next page, writing the collected data into a file) for 20 pages using selenium, repeating the code 20 times like so:

# GRAB NEWS 1 AND CLICK NEXT
time.sleep(3)
page1_title = driver.title
page1_url = driver.current_url
page1_date = driver.find_element(By.XPATH, '//*[@id="root"]/div/main/div/div/div/div/div/article/div[2]/div/div/div/div/div[1]').text
scroll_click_older()  # a pre-defined scroll-and-click function
print('Grabbed all and clicked on "Older" button1')

# GRAB NEWS 2 AND CLICK NEXT
time.sleep(3)
page2_title = driver.title
page2_url = driver.current_url
page2_date = driver.find_element(By.XPATH, '//*[@id="root"]/div/main/div/div/div/div/div/article/div[2]/div/div/div/div/div[1]').text
scroll_click_older()  # a pre-defined scroll-and-click function
print('Grabbed all and clicked on "Older" button2')

etc. to item 20

Then writing it all to a file like so:

# OPEN, WRITE, CLOSE FILE - write all collected data to a file
text_file = open("news.csv", "wt")
text_file.write(page1_date + ',' + page1_url + ',' + page1_title + "\n")
text_file.write(page2_date + ',' + page2_url + ',' + page2_title + "\n")
text_file.close()

Also I repeated the writing 20 times to have data from all 20 news items printed in the file. Now I get this working, I would like to be able to specify in the terminal how many news items to scrape, and then scraping and writing as many as specified. So I tried it with a 'for loop', like this:

# Get number to know how many news items to scrape
amount_to_scrape = int(input("How many news items to scrape?: "))

# Grab title, URL and publication date of the news item
for n in range(amount_to_scrape):
    page(n)_title = driver.title
    print('Grabbed page' + str(n) + ' title')
    page(n)_url = driver.current_url
    print('Grabbed page' + str(n) + ' URL')
    page(n)_date = driver.find_element(By.XPATH,
                                     '//*[@id="root"]/div/main/div/div/div/div/div/article/div[2]/div/div/div/div/div[1]').text
    print('found and grabbed news' + str(n) + ' date')
    scroll_click_older()
    print('Clicked on "Ouder" button' + str(n) + ' now moving to next page')

The problem now is that it doesn't work: adding 'n' to functions is not working like this says PyCharm, but I don't know what is the correct way.

I cannot find a solution, as I don't know how to describe it, so finding answers is hard. If anyone can point me to a solution that would be great.

-=edit after question has been closed=- Reply to Kart: Thanks. You write: "Welcome to Stack Overflow. It's hard to understand the question, because you are trying to use words to mean things that they don't actually mean. "adding 'n' to functions" makes absolutely no sense."

And this is exactly why I cannot find an answer to my question, as I wrote: "I cannot find a solution, as I don't know how to describe it, so finding answers is hard."

Reply to Henro, who wrote:"do you found any error or what?" Yes, the error is:

page(n)_title = driver.title ^^^^^^ SyntaxError: invalid syntax

But I see my question is closed, I don't understand the answers in the 'duplicate' provided, but I will read up on dictionaries as Tom suggested.

Thanks all!

AtonofPy
  • 1
  • 1
  • Welcome to Stack Overflow. It's hard to understand the question, because you are trying to use words to mean things that they don't actually mean. "adding 'n' to functions" makes absolutely no sense. It **looks like** what you mean is "interpolating the numeric value of `n`, interpreted as text, into a variable name". This is doable, but a bad idea. Please see the linked duplicate. – Karl Knechtel Sep 15 '22 at 11:01
  • Read up on dictionaries https://docs.python.org/3/tutorial/datastructures.html#dictionaries – Tom McLean Sep 15 '22 at 11:02
  • do you found any error or what? or may you can try this one: in this line: 'page' + str(n) + '_title' = driver.title – Henro Sutrisno Tanjung Sep 15 '22 at 11:08

0 Answers0