-1

I'm learning how selenium crawls data, but I find that when a website opens through selenium, it's different from what I used to get when I used other normal browsers. Even I add headers. And I'm very confused. I really want to upload two contrast pictures, but I can't upload them in stackoverflow at present. I even tried to open the chrome driver and enter the web address manually, but the result is still different.

I use Python 3.6, selenium and chrome 75.0.3770.80

from selenium import webdriver
driver = webdriver.Chrome() #创建driver实例
url = 'https://www.free-ss.ooo'
driver.get(url)

At present, I can't post pictures on stack overflow, but I just want to figure out how I can use selenium to get normal web pages.

  • In which way are the pages opened through _Selenium_ looks **ab**normal to you? – undetected Selenium Jun 06 '19 at 06:03
  • It's perfectly normal, but the table data on the website is different. – Funny Chen Jun 06 '19 at 06:04
  • 1
    Which data? How are they different? Expected and Actual table data? Please update the question. – undetected Selenium Jun 06 '19 at 06:06
  • The website I want to get has a data table itself, and the data I see in the data table when I use selenium is different from what I see in the normal browser directly. Besides, the other styles of the website are the same. I tried to disguise mobile access in the options of chrome driver, but the data I got is still different. It seems that the website can recognize that the program is passable. Access through selenium and transfer different data – Funny Chen Jun 06 '19 at 06:09
  • 1
    @FunnyChen In your normal browser, there are cookies stored and hence the content provided in the table may be tailored. But in case of selenium it opens fresh browser session with clear cookies. You may try verify by opening the link in incognito and trying to match it with selenium. – Ja8zyjits Jun 06 '19 at 06:35
  • Possible duplicate of [Inspect Element and View Source Code are showing two different things](https://stackoverflow.com/questions/18244908/inspect-element-and-view-source-code-are-showing-two-different-things). Basically, don't expect the data returned via a view-source: URI to match the data from the devtools. – orde Jun 06 '19 at 06:47
  • @Ja8zyjits First of all, thank you very much. I tried to use Incognito Mode of chrome, but I still got normal web pages. – Funny Chen Jun 06 '19 at 07:05
  • @orde No, I'm not talking about Inspect Element and View Source Code showing two different things. I'm talking about pages that are directly visible to the eye, so I'm very confused. It's not JSON loading. – Funny Chen Jun 06 '19 at 07:08

2 Answers2

0

Aha,I found the problem, really because the target site detected selenium, the solution is to add options

Chrome_options. add_experiment_option ('excludeSwitches', ['enable-automation'])

  • You can mark your answer as accepted. Meanwhile can you try headless mode and verify if your site is able to detect the bot? `chrome_options = webdriver.ChromeOptions() chrome_options.headless = True driver = webdriver.Chrome(options=chrome_options)` – Ja8zyjits Jun 06 '19 at 13:03
0

Faced same issue and was able to resolve it by removing or fixing appropriate user-agent argument and it worked fine in both headless and non-headless mode.

The resolution was inspired by PDHide post

Rola
  • 1,598
  • 13
  • 12