12

I'm trying to render some simple HTML documents (contain mostly div and br tags) to plain text, but I'm struggling on when to add new lines. I assumed it would be quite simple with <div> and <br/> generating new lines, but it looks like there's various subtle rules. For example:

<div>one line</div>
<div>two lines</div>

<hr/>

<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>

<hr/>

<div>one line<br/></div>
<div></div>
<div>still two lines because the br tag is ignored</div>

<hr/>

<div>one line<br/></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>

<hr/>

<div><div>Wrapped tags generate only one new line<br/></div></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>

So I'm looking for a specification on how new lines should be rendered in HTML documents (when no CSS is applied). Any idea where I could find this kind of document?

laurent
  • 88,262
  • 77
  • 290
  • 428
  • In your question you are saying `
    still two lines because the br tag is ignored
    ` But I am not seeing any br tag in between
    – S M Jan 17 '17 at 05:28
  • 1
    even if you know the spec, you're going to have a huge challenge in somehow programmatically translating all of those nested `div`s and `br`s and everything to plaintext newlines. – andi Jan 18 '17 at 20:36

9 Answers9

13

If you are looking for the specification for <div> and <br>, you won't find it in one place, because each of them follow separate rules. DIV elements follow the block formatting rules, while BR elements follow the text flow rules.

I believe that the cause of your confusion is the assumption that they follow the same new lines rule. Let me explain.

The BR element.

BR is defined in HTML4 Specification Section 9.3 regarding Lines and Paragraphs:

The BR element forcibly breaks (ends) the current line of text.

And in HTML5 Specification Section 4.5 regarding Text-level semantics:

The <br> element represents a line break.

The specification explains the result your third example:

<div>one line<br/></div>
<div></div>
<div>still two lines because the br tag is ignored</div>

There, the BR element is not ignored at all, because it marks that the line must be broken at that point. In other words, it marks the end of the current line of text. It is not about creating new lines.

In your fourth example:

<div>one line<br/></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>

the BR elements also marks the end of the line. Because the line has zero characters, it is rendered as an empty line.

Therefore, the rule is the same in your third and fourth example. Nothing is ignored.

The DIV element.

In the absence of explicit style sheet, the default style applies. A DIV element is by default a block-level element which means it follows the block formatting context defined in CSS Specification Section 9.4.1:

In a block formatting context, boxes are laid out one after the other, vertically, beginning at the top of a containing block.

Therefore, this is also not about creating new lines because in a block formatting context, there is no notion of lines. It is about placing block elements one after another from top to bottom.

In your second example:

<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>

the empty DIV has zero height, therefore it has no effect on the rendering of the next block-level element.

In your fifth example:

<div><div>Wrapped tags generate only one new line<br/></div></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>

the outer DIV functions as a containing block as defined in Section 9.1.2 and the inner DIV is defined Section 9.4.1 that I have quoted above. Because no CSS is applied, a DIV element by default has zero margin and zero padding, which makes every edge of the inner DIV touches the corresponding edges the outer DIV. In other words, the inner DIV is rendered at exactly the same place as the outer DIV.

I believe that's everything.

laurent
  • 88,262
  • 77
  • 290
  • 428
Rei
  • 6,263
  • 14
  • 28
  • Thank you, that makes complete sense. I hope you don't mind I've added the examples into the text to make it easier to follow your explanation. – laurent Jan 24 '17 at 10:29
  • @this.lau_ Good call. They do make it easier to follow. Thank you. – Rei Jan 24 '17 at 18:24
9
<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>

I wouldn't say that the second div doesn't count, to be more precise, it has default block width of 100% but 0px of height due to being empty. Obviously, there's no padding and margin either but it's still technically there. It counts.

<div>one line<br/></div>
<div></div>
<div>still two lines because the br tag is ignored</div>

br tag isn't ignored either, it has done it's job of creating a line break within the current line of text within the parent block level div. Emphasized wording is directly from the docs. Note it mentions the current line of text only. It doesn't create the next line, it creates a break that may lead to a new line if there is content.

enter image description here

There simply isn't any text after it to be placed on the second line. Thus, the next div is created right below and abides by the rules mentioned above.

<div>one line<br/></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>

Building on the previous logic, none of the br tags are ever ignored. Both of the tags in this example are actually creating a new line break within their parent block level div elements.

enter image description here

These br tags are acting like a marker that states "from this point till the end of the line, within my parent block level element, will not be any inline content allowed". However, in all of these cases there's nothing to be placed on the next line.

The next div, being a block level element basically resets that behavior. The previous breaks are contained within their lines of text and their parent block level elements. We knows this because a line of text can not stretch between two block level elements.

In regard to your comment on another answer.

Block level elements do always start on a new line. As explained above, an empty div does exist and does start on a new line, it simply has 0 height. If you have two nested, empty div elements they both start on the same new line because they are both empty block level elements without any content that creates lines. If you add text to a parent div before the child div it will get pushed to a new line. Think of it as the same line of text if it helps. For example:

Same line:

<div>
    <div>
        bar
    </div>
</div>

Different lines:

<div>
    foo
    <div>
        bar
    </div>
</div>
Serg Chernata
  • 12,280
  • 6
  • 32
  • 50
2
  • <DIV> = division. It's a block of potentially mixed content.
  • <BR> = break. Just a line break.
  • <P> = paragraph.

If you want to create a document like a word processor then <P> is the way to go.

Lots of new developers seem to struggle with this when implementing tinyMCE the first few times. Hitting [enter] creates a <P>, while [shift]+[enter] creates a <br>. Exactly like a word processor.

Jules
  • 1,941
  • 15
  • 18
1

What you are missing here I guess is that div is a block-level element and thus always start a new line (without CSS). Concerning the empty div I think since there is nothing to display, it will not render any new line; it may also depend on your browser implementation of the HTML standard.

You can find more information on block or inline HTML element here here

Jaay
  • 2,103
  • 1
  • 16
  • 34
  • 1
    "A block-level element always starts on a new line" - not "always". For example, if it's empty, or if it's wrapping another block-level element, and maybe there are other exceptions. I'm basically looking for the full set of rules (I assume it exists since all browsers seem to be consistent, but I can't find it). – laurent Dec 07 '16 at 10:38
  • @this.lau_ A block-level element *always* starts on a new line. If it's empty, the block-level `
    ` will just have a height of `0`; the height of an element is derived from the height of its content, unless explicitly set.
    – Joshua Shearer Jan 23 '17 at 21:46
1

For your second example, you can put &nbsp; inside the <div> so that it's rendered as empty line. Also for your fourth example, you can put the double br in the first div.

However, I'm not aware of any specification on this.

<div>one line</div>
<div>&nbsp;</div>
<div>still two lines because the empty div doesn't count</div>

<hr/>

<div>one line<br/><br/></div>
<div>three lines this time because the second br tag is not ignored</div>
Oscar Siauw
  • 483
  • 2
  • 8
1

A block level element will always start on a new line unless it is the immediate first child of another element.

In your example #2

<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>

The lines are three, but they appear as if they were two because of the absence of visual content in the second div. You can define custom margins and borders to get a visual on that.

A br element will always break the content flow and the node afterwards will start on a new line, regardless of whether that node happens to be a block-level element or not.

php_nub_qq
  • 15,199
  • 21
  • 74
  • 144
1

In your question you are saying that a <br/> tag in between two divs are ignoring. But your snippet seems buggy. Actually it wont ignore. I have corrected the snipped. It is the right way of inserting a new line in between, without using css

<div>one line</div>
<div>two lines</div>

<hr/>

<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>

<hr/>

<div>one line</div>
<br/>
<div>Three lines because the br tag is not ignored</div>

<hr/>

<div>one line</div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>

<hr/>

<div><div>Wrapped tags generate only one new line<br/></div></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>
S M
  • 3,133
  • 5
  • 30
  • 59
1

How about letting the jQuery engine render the HTML to text? Take a look at the snippet below, if you click "Run" you'll see an alert box which displays just the text:

var sample = $("#sample").text();
alert(sample);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<html>
<head/>

<body>

  <div id="sample">

    <div>one line</div>
    <div>two lines</div>

    <hr/>

    <div>one line</div>
    <div></div>
    <div>still two lines because the empty div doesn't count</div>

    <hr/>

    <div>one line
      <br/>
    </div>
    <div></div>
    <div>still two lines because the br tag is ignored</div>

    <hr/>

    <div>one line
      <br/>
    </div>
    <div>
      <br/>
    </div>
    <div>three lines this time because the second br tag is not ignored</div>

    <hr/>

    <div>
      <div>Wrapped tags generate only one new line
        <br/>
      </div>
    </div>
    <div>
      <br/>
    </div>
    <div>three lines this time because the second br tag is not ignored</div>

  </div>
</body>

</html>

You can use the content of the variable sample to process it further, for example submit it to an AJAX method.

If you run it, you will recognize that all of the tags are regarded - it is just a matter of how the style defaults are defined. Having said that, I believe you can't disregard the styles completely, because it does matter - even if you don't specify it there will be some style assumed and applied.

What you get from $("#sample").text(); is just the line breaks and plain text, which is what I understood from your question you wanted to achieve.

Matt
  • 25,467
  • 18
  • 120
  • 187
  • On Chrome 55, this generates a lot of extra lines in the alert output. – Mike Godin Jan 23 '17 at 16:21
  • @MikeGodin: Yes - same in IE, that is also mentioned in one of the answers. Serg wrote: "I wouldn't say that the second div doesn't count, to be more precise, it has default block width of 100% but 0px of height due to being empty" - which is why I said it isn't independent from the styles how it is looking in the browser. Since jQuery just translates it into CR+LF, everything counts - and you see extra lines. – Matt Jan 23 '17 at 16:29
1

According to the spec only the <br> and <wbr> elements are meant for line break:

  • <br> elements must be used only for line breaks that are actually part of the content, as in poems or addresses.
  • <br> elements must not be used for separating thematic groups in a paragraph (Just se another <p> element).

You can also use <wbr> (more info here)

You can find more info at the spec itself. (Single page version to better search) https://www.w3.org/TR/html/single-page.html#elementdef-br

PD: Certain attributes accept LF (U+000A) like the title attribute in the <abbr> tag.

In the end any empty block element would do the job. (without CSS) The full list is here

JuanGG
  • 834
  • 7
  • 11