0

I've been working on this project where a site gives me 3 numbers portrayed with pictures.I quickly learned that the developers forgot to change the name of the PNG files in the HTML source so I learned to read those using code.

I have already setup a code with selenium and beatifulsoup to make a chrome page with that particular site.Give me time to login.Read the HTML Source in text code, find the numbers and instert them to the specified area and click the continue button.Then loop.

import time
import re
import numbers
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(executable_path = "C:/Users/user/Desktop/Personal/PythonScripts/chromedriver.exe")
driver.get('URL I USED')

time.sleep(20)
driver.refresh()

soup = BeautifulSoup(driver.page_source ,"lxml")
all_items = soup.find_all('img')

for item in all_items:
    print('char:', item['src'][-5])

LIST = [item['src'][-5] for item in all_items]

new_list = []
for value in LIST:
    try:
        new_list.append(int(value))
    except ValueError:
        continue

for i in new_list: 
    print(i, end="")

try :
 driver.find_element_by_tag_name('input').send_keys(new_list[0],new_list[1],new_list[2])
except :
    print('Fail')

html_source = driver.page_source
print(html_source)

button = driver.find_element_by_xpath('//*[@id="main"]/form/div[3]/input')
button.click()

time.sleep(5)
while True:
        for item in all_items:
            print('char:', item['src'][-5])

        LIST = [item['src'][-5] for item in all_items]

        new_list = []
        for value in LIST:
            try:
                new_list.append(int(value))
            except ValueError:
                continue

        for i in new_list: 
           print(i, end="")

        try :
            driver.find_element_by_tag_name('input').send_keys(new_list[0],new_list[1],new_list[2])
        except :
            print('Fail')

        html_source = driver.page_source
        print(html_source)

        new_list.clear()
        button.click()
        time.sleep(10)
        driver.refresh()

I am fairly new to python and coding in general so feel free to point out my mistakes even unrelated to the topic. My issue is the page doesn't necessarily reload after pressing continue and I tried not reloading it but my code "reads" the first survey correctly but fails on the proceeding ones as it fills the blank with the previously used numbers.(only the first one).I added more than necessary amount of sleep time.Removed the driver.refresh() at the end of the loop.None of them worked.

Thanks in advance

  • Are you aware of the fact that, if your requirement is to use [Selenium](https://stackoverflow.com/questions/54459701/what-is-selenium-and-what-is-webdriver/54482491#54482491), then by using [Beautifulsoup](https://stackoverflow.com/questions/47983495/python-which-is-considered-better-for-scrapping-selenium-or-beautifulsoup-wit/47983631#47983631) you are underutilizing the power of Selenium. – undetected Selenium Feb 20 '20 at 07:38
  • Is it worth to scrap some parts to add Selenium or should I leave it with Beatifulsoup at this point? – DismissedFetus Feb 20 '20 at 11:04
  • Honestly, I don't practice BS as Selenium is self sufficient. But if the webpages are static definitely you should move with Requests and BS. If pages are dynamic, stick with Selenium. – undetected Selenium Feb 20 '20 at 11:08
  • The page doesn't reload after sending the survey, only the .png values change.Would that be dynamic or static? I just tried to incorperate the code the answer below gave me and it failed probably because lack of my experience.I am keen on keeping BS but will change if necessary! – DismissedFetus Feb 20 '20 at 11:15

1 Answers1

0

i would recommend to you changing the script to end the driver session after the survey is send. Just start the while loop before starting the driver.

On the other hand, you are using BeutifulSoup only to get the images, you can change it to

driver.find_elements_by_tag_name ('img')

to use selenium in all the script, the find_elements_by_tag_name will return a List of Elements