7

Note:

answer will be like this .. yes ..there are rule and approach.. it's depend on adoption agency algorithm etc. you can follow those (links) etc. and your short description . i am not looking for any algorithm or more..

from example if an a tag nest an a tag. chrome browser restructure html . similar for a button tag . both are invalid html structure

I know a and button tag are both interactive element. interactive elements(a,button,input etc) are related to the user's activity.

From the comment:

it’s too broad question.

  1. Only answer for below example code or scenario.

I think every browser should work same way from somewhere one point... That means should follow some basic rules or approach. If anyone found something like this please mention.

why I am looking for this ?

when I work for a client . I find many invalid html structure . many types of html structure. sometime I need to redesign for invalid html structure. I think I need to understand the concept of browser restructure for further web development..

follow those question Q1 and Q2

 let aTag = document.querySelectorAll('a')
    let buttonTag = document.querySelectorAll('button')
   
 console.log('===a tag===')
    aTag.forEach(function(item){
         console.log(item)
    })
     console.log('===button tag===')

      buttonTag.forEach(function(item){
         console.log(item)
    })
 .card{
      display: flex;
      flex-direction: column;
      padding: 10px;
      border: 1px solid slateblue;
       width: 200px;
      text-align: center;
 }
<a href="" class="parent a-tag">
    <div class="card">
        <h4>Title</h4>
        <p>Paragraph</p>
        <button>Add to Cart</button> 
        <a href="" class="child a-tag">Compare Product</a>
        
    </div>
</a>


<button>
    <a href="">A Tag</a>
   <button>Button</button>
</button>

Picture from console google chrome browser

enter image description here

Dev AKS
  • 512
  • 1
  • 5
  • 17
نور
  • 1,425
  • 2
  • 22
  • 38
  • 6
    I'm voting to close this as "too broad", because the algorithms that browsers use to parse HTML are incredibly complex, and an answer on this site could never summarise them all meaningfully. However, have a look at [the WHATWG HTML Specification](https://html.spec.whatwg.org/multipage/) which attempts to standardise a lot of this parsing. – IMSoP Nov 13 '20 at 12:47
  • @IMSoP no need to broad description .. only describe the example.. or approach ...... i think it will be enough – نور Nov 13 '20 at 12:51
  • 1
    That's what I'm trying to say: there is no "broad example or approach", there are hundreds of lines of code, or hundreds of lines of spec, detailing how to deal with all the edge cases produced by different kinds of broken HTML. – IMSoP Nov 13 '20 at 13:05
  • There are no rules - each browser engine treats quirks mode differently. Not only that, each case of a quirk may be treated differently. In general you can say that the moment the syntax becomes invalid the browser assumes you missed a close tag and will close the last known valid structure but this is not always the case because even what is invalid and valid is sometimes treated differently – slebetman Nov 13 '20 at 13:08
  • @slebetman i believe there are some logic or something .. example for `nextElementSibling` [link](https://developer.mozilla.org/en-US/docs/Web/API/NonDocumentTypeChildNode/nextElementSibling).. @ Louys Patrice Bessette gave me from( Q1 comments) the questions – نور Nov 13 '20 at 13:17
  • I will wait. then I will draw attention .. I believe I found solution or approach or source or something from someone . please don't vote for close.... – نور Nov 13 '20 at 13:24
  • Whether a question is closed or not should not depend on its answers - either your **question** can be worded so that it is focussed enough to fit the site, or it should be closed. It's important to remember that having your question closed doesn't mean it's not an **interesting** question, just that this isn't the right place to discuss it. – IMSoP Nov 13 '20 at 13:58
  • 1
    @slebetman While you're partly correct, part of the aim of the WHATWG was to re-design the HTML specification so that it does standardise a lot of these "quirks mode" features, rather than the W3C's approach of making the language stricter and simply forbidding them (e.g. XHTML). That doesn't mean there's a single simple principle, though, just that the browser vendors have documented and agreed how various mistakes are handled. For isntance, `` is treated as `
    `; not as part of a general rule, it's just in the spec that that's the agreed way to handle that particular mistake.
    – IMSoP Nov 13 '20 at 14:16
  • 1
    Which makes it ironic then that Safari, a browser from one of the founders of WHATWG, often behaves differently from Chrome and Firefox – slebetman Nov 13 '20 at 18:57
  • 2
  • 1
    I think this is the first time I've seen a question with 2 close votes and a bounty at the same time. If you want to create a version of this question that fits the site, you need to find a way to focus it; for instance, describe a particular HTML mistake, and ask how a particular browser handles it. As it is, you could fill a book with details about different things that different browsers do, and still not have answered your question, because **there is no single algorithm**, it's a bunch of special cases, historical accidents, and things that might change weekly when other bugs are fixed. – IMSoP Nov 15 '20 at 20:54
  • 2
    @slebetman what are you talking about. There are definitely rules, for the 2 cases here it's [here](https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody:stack-of-open-elements-31) for the – Kaiido Nov 16 '20 at 08:20
  • @slebetman And a document where errors are found doesn't trigger quirks mode. What triggers quirks mode is to not have a doctype. – Kaiido Nov 16 '20 at 08:22
  • @Kaiido This isn't a forum to have a debate about web standards. If you think this question is focussed enough to be answered within the format of this site, it sounds like you might have the knowledge to answer it. If you agree with me that it needs to be more focussed before it can be answered, please Vote to Close. – IMSoP Nov 16 '20 at 08:46
  • 1
    @IMSoP when someone try to resolve issue .. why you need to close ? – نور Nov 16 '20 at 08:51
  • 2
    @IMSoP I agree with you it should get closed, but we can't close a question with an open bounty on it... If *noor* agreed to remove the bounty, I'd be glad to VTC it. Or if they agree to [edit] in a way it asks **clearly** about a single situation, then I'd be happy to answer if I can. – Kaiido Nov 16 '20 at 08:55
  • @noor As I've said several times, because I don't think it's a good fit for this site, **in the way you've currently asked it**. That doesn't mean it's a "bad" question, or that I think badly of you, or of anyone who wants to help you. It just means that this site has a very constrained format, and doesn't suit big open-ended questions like this. – IMSoP Nov 16 '20 at 08:59
  • @Kaiido through your source i found about [reconstruct](https://html.spec.whatwg.org/multipage/parsing.html#reconstruct-the-active-formatting-elements)... – نور Nov 16 '20 at 09:31
  • 1
    @noor You realise that's the same site I linked to in my very first comment? To reiterate, there are hundreds of lines of that spec that are relevant to your question in the broad way you've asked it. If you want to know which part of the spec covers a particular scenario you're interested in, [edit] your question to focus on that once scenario. – IMSoP Nov 16 '20 at 09:56
  • 2
    @noor You could for instance ask "Is there a specification which describes how Chrome handles an `a` tag inside a `button` tag?" - naming both the browser and the specific scenario. But note that the answer to that won't tell you anything about, for instance, how the browser handles a `td` that's not in a `table`, or an element with the name `image` (which has the wonderful rule "Change the token's tag name to "img" and reprocess it. (Don't ask.)"!) because there is no universal rule that governs all these things. – IMSoP Nov 16 '20 at 10:12
  • @IMSoP thanks for response and time.... i will edit my question about in 10 hours or remove... – نور Nov 16 '20 at 10:17
  • @Kaiido I update question ... through your source .. i found this ..An end tag whose tag name is one of: "a", "b", more..Run the [adoption agency algorithm](https://html.spec.whatwg.org/multipage/parsing.html#adoption-agency-algorithm) for the token for `a` tag .. also write some about `button` tag.. please put your answer.... – نور Nov 17 '20 at 07:40
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/224654/discussion-between-noor-and-imsop). – نور Nov 17 '20 at 08:30
  • @Kaiido You misunderstand quirks mode. Quirks mode is when the browser cannot formally parse the document and instead of raising an error tries to guess what the author meant. Not having a doctype is one such error that will trigger quirks mode but quriks mode is also triggered on other errors – slebetman Nov 18 '20 at 01:54
  • @slebetman no, only the Doctype's value can set the [quirks mode](https://dom.spec.whatwg.org/#concept-document-quirks): *"The mode is only ever changed from the default for documents created by the HTML parser based on the presence, absence, or value of the DOCTYPE string, and by a new browsing context"*. – Kaiido Nov 18 '20 at 02:01
  • @Kaiido Quirks mode was invented before the invention of DOCTYPE. So how can DOCTYPE be the requirement for entering quirks mode if it did not exist yet? – slebetman Nov 19 '20 at 04:47
  • @slebetman once again, *what are you talking about?* Doctype is there since even before HTML, it's part of SGML. – Kaiido Nov 19 '20 at 05:22
  • 1
    @slebetman I think it's reasonable to call what you're describing "quirks", but "quirks mode" means more than "a mode where quirks happen", and is not related to invalid markup. Instead, it's a specific mode where browsers emulate behaviour that pre-dates modern standards (in particular, layout algorithms that pre-date CSS), _even for entirely valid HTML_. See for instance [MDN](https://developer.mozilla.org/en-US/docs/Web/HTML/Quirks_Mode_and_Standards_Mode) and [this guide it links to](https://hsivonen.fi/doctype/). – IMSoP Nov 19 '20 at 22:38

1 Answers1

2

The first of all you can read specification and try to find cases which you interested:

https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inbody

In ideal world all browsers should implement described in spec algorithms how to deal with different situations. But spec could not cover all cases and because we all people we can make mistakes/misinterpretations etc so I recommend to look at Chromium source code where you may find solution for some quirk cases which not covered by the spec but could be covered by particular browse engine:

https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/core/html/parser/html_tree_builder.cc;l=688;drc=fb051a0f9d7f7b5695e9ba33c30d174702eae7d0;bpv=0;bpt=1

maksimr
  • 4,891
  • 2
  • 23
  • 22
  • 1
    Can you cite any example where Chrome's HTML syntax parsing algorithm deviates from the spec? Or can you cite a case of HTML markup which the spec does not cover? – Alohci Nov 16 '20 at 09:21
  • 1
    @Alohci An open tag whose tag name is "sarcasm" – maksimr Nov 16 '20 at 16:59