0

My original html source is below:

<html>
    <head>
        <title> aaaaa<bbbbb </title>
    </head>
    <body>

    </body>
</html>

As you can see there is a mistake in the title. There is an unclosed < between aaaaa and bbbbb.

When I open this page with web browsers (firefox, chrome and edge), the browsers fix the problem and change the source code to this:

<html>
    <head>
        <title> aaaaa&lt;bbbbb </title>
    </head>
    <body>
    </body>
</html>

So is there a way to prevent browsers to fix problems in original htmls? When I browse, I want to see original html source.

Note: I am using firefox geckodriver with python/selenium. So any solution that includes a configuration in firefox or python code would be OK.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Yavuz
  • 1,257
  • 1
  • 16
  • 32

1 Answers1

0

There are some fundamental difference between the HTML DOM shown through View Source i.e. using ctrl + U and the markup shown through Inspector i.e. using ctrl + shift + I.

Both the methods are two different browser features which allows users to look at the HTML of the webpage. However, the main difference is the View Source shows the HTML that was delivered from the web server (application server) to the browser. Where as, Inspect element is a Developer Tool e.g. Chrome DevTools to look at the state of the DOM Tree after the browser has applied its error correction and after any Javascript have manipulated the DOM. Some of those activities may include:

  • HTML error correction by the browser
  • HTML normalization by the browser
  • DOM manipulation by Javascript

In short, using View Source you will observe the Javascript but not the HTML. The HTML errors may get corrected in the Inspect Elements tool. As an example:

  • With in View Source you may observe:

    <h1>The title</h2>
    
  • Whereas through Inspect Element that would have corrected as:

    <h1>The title</h1>
    

This usecase

Based on the above mentioned concept the following markup:

<html>
    <head>
    <title> aaaaa<bbbbb </title>
    </head>
    <body>

    </body>
</html>

gets corrected as:

<html>
    <head>
    <title> aaaaa&lt;bbbbb </title>
    </head>
    <body>
    </body>
</html>
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352