Is there a way to automatically control orphaned words in an HTML document?

Question

I was wondering if there's a way to automatically control orphaned words in an HTML file, possibly by using CSS and/or Javascript (or something else, if anyone has an alternative suggestion).

By 'orphaned words', I mean singular words that appear on a new line at the end of a paragraph. For example:

"This paragraph ends with an undesirable orphaned
word."

Instead, it would be preferable to have the paragraph break as follows:

"This paragraph no longer ends with an undesirable
orphaned word."

While I know that I could manually correct this by placing an HTML non-breaking space ( ) between the final two words, I'm wondering if there's a way to automate the process, since manual adjustments like this can quickly become tedious for large blocks of text across multiple files.

Incidentally, the CSS2.1 properties orphans (and widows) only apply to entire lines of text, and even then only for the printing of HTML pages (not to mention the fact that these properties are largely unsupported by most major browsers).

Many professional page layout applications, such as Adobe InDesign, can automate the removal of orphans by automatically adding non-breaking spaces where orphans occur; is there any sort of equivalent solution for HTML?

... except for the jQuery plugin that @ShawnChin mentions :-) — Pointy, Jan 13 '12 at 16:59
possible duplicate of [Widow/Orphan Control with JavaScript?](http://stackoverflow.com/questions/4742418/widow-orphan-control-with-javascript) — davidcondrey, Feb 08 '15 at 22:45

Shawn Chin · Answer 1 · 2012-01-13T17:00:02.097

27

You can avoid orphaned words by replacing the space between the last two words in a sentence with a non-breaking space ( ).

There are plugins out there that does this, for example jqWidon't or this jquery snippet.

There are also plugins for popular frameworks (such as typogrify for django and widon't for wordpress) that essentially does the same thing.

edited Jan 13 '12 at 17:00

answered Jan 13 '12 at 16:54

Shawn Chin

84,080
19
162
191

The problem with the ` ` approach is that you could potentially end up (on a narrow display) with a single word on the second-last line followed by two words on the last line, which would look even worse. – clayRay Mar 21 '22 at 23:50

josh1978 · Answer 2 · 2018-02-20T19:51:50.117

I know you wanted a javascript solution, but in case someone found this page a solution but for emails (where Javascript isn't an option), I decided to post my solution.

Use CSS white-space: nowrap. So what I do is surround the last two or three words (or wherever I want the "break" to be) in a span, add an inline CSS (remember, I deal with email, make a class as needed):

<td>
    I don't <span style="white-space: nowrap;">want orphaned words.</span>
</td>

In a fluid/responsive layout, if you do it right, the last few words will break to a second line until there is room for those words to appear on one line.

Read more about about the white-space property on this link: http://www.w3schools.com/cssref/pr_text_white-space.asp

EDIT: 12/19/2015 - Since this isn't supported in Outlook, I've been adding a non-breaking space   between the last two words in a sentence. It's less code, and supported everywhere.

EDIT: 2/20/2018 - I've discovered that the Outlook App (iOS and Android) doesn't support the   entity, so I've had to combine both solutions: e.g.:

<td>
    I don't <span style="white-space:nowrap;">want&nbsp;orphaned&nbsp;words.</span>
</td>

score 3 · Answer 3 · edited May 23 '17 at 12:10

3

In short, no. This is something that has driven print designers crazy for years, but HTML does not provide this level of control.

If you absolutely positively want this, and understand the speed implications, you can try the suggestion here:

detecting line-breaks with jQuery?

That is the best solution I can imagine, but that does not make it a good solution.

edited May 23 '17 at 12:10

Community

1
1

answered Jan 13 '12 at 16:54

Jonathan Rich

1,740
10
11

score 2 · Answer 4 · answered Jan 21 '22 at 17:13

I see there are 3rd party plugins suggested, but it's simpler to do it yourself. if all you want to do is replace the last space character with a non-breaking space, it's almost trivial:

    const unorphanize = (str) => {
        let iLast = str.lastIndexOf(' ');
        let stArr = str.split('');
        stArr[iLast] = '&nbsp;';
        return stArr.join('')
}

I suppose this may miss some unique cases but it's worked for all my use cases. the caveat is that you can't just plug the output in where text would go, you have to set innerHTML = unorphanize(text) or otherwise parse it

score 1 · Answer 5 · edited Jan 30 '13 at 09:33

If you want to handle it yourself, without jQuery, you can write a javascript snippet to replace the text, if you're willing to make a couple assumptions:

A sentence always ends with a period.
You always want to replace the whitespace before the last word with

Assuming you have this html (which is styled to break right before "end" in my browser...monkey with the width if needed):

<div id="articleText" style="width:360px;color:black; background-color:Yellow;">
    This is some text with one word on its own line at the end.
    <p />
    This is some text with one word on its own line at the end.
</div>

You can create this javascript and put it at the end of your page:

<script type="text/javascript">
    reformatArticleText();
    function reformatArticleText()
    {
        var div = document.getElementById("articleText");
        div.innerHTML = div.innerHTML.replace(/\S(\s*)\./g, "&nbsp;$1.");
    }
</script>

The regex simply finds all instances (using the g flag) of a whitespace character (\S) followed by any number of non-whitespace characters (\s) followed by a period. It creates a back-reference to the non-white-space that you can use in the replace text.

You can use a similar regex to include other end punctuation marks.

Thanks for the suggestion! I like the simple elegance of the JavaScript; however, I don't seem to be getting the desired results when testing your code. I uploaded my test to the following link: [ http://littleblackkitten.com/orphan-test.html ] The browser seems to be replacing the final **letter** with the non-breaking space, and not the final **space**. Am I doing something wrong? Do you get successful results when testing the code? Thanks again for your help! — Josh M. Lenius, Jan 13 '12 at 18:49
That regex will actually break some things which end with an html element (for example, if you have an image tag at the end of your article). Replace it with the regex here to keep from messing up inner html: http://justinhileman.info/article/a-jquery-widont-snippet/ — bobthecow, Jan 14 '12 at 02:23
Watch what happens when you run this in the console on this page. `document.body.innerHTML = document.body.innerHTML.replace(/\S(\s*)\./g, " $1.");` — Bryan Downing, Jan 28 '16 at 08:23

score 0 · Answer 6 · answered Dec 18 '17 at 05:05

If third-party JavaScript is an option, one can use typogr.js, a JavaScript "typogrify" implementation. This particular filter is called, unsurprisingly, Widont.

<script src="https://cdnjs.cloudflare.com/ajax/libs/typogr/0.6.7/typogr.min.js"></script>
<script>
document.body.innerHTML = typogr.widont(document.body.innerHTML);
</script>
</body>

Is there a way to automatically control orphaned words in an HTML document?

6 Answers6

Linked

Related