13

This is a problem I'm running into and I'm not quite sure how to approach it.

Say I have a paragraph:

"This is a test paragraph. I love cats. Please apply here"

And I want a user to be able to click any one of the words in a sentence, and then return the entire sentence that contains it.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
freedomflyer
  • 2,431
  • 3
  • 26
  • 38
  • nextSibling() and previousSibling() – freedomflyer Nov 14 '12 at 20:42
  • 4
    Sentence boundary detection is a medium-hard NLP problem, due to abbreviations and defective punctuation. Not a good target for Javascript. – bmargulies Nov 14 '12 at 20:43
  • 1
    Check out [Rangy](http://code.google.com/p/rangy/). It's a pretty solid text selection library that I've used quite a bit. Should help abstract out a lot of the pain in finding text boundaries such as sentences, words, etc. – Matthew Blancarte Nov 14 '12 at 20:45
  • 3
    How could this be voted as "off topic" ? – Denys Séguret Nov 14 '12 at 20:52
  • I checked out Rangy but I can't seem to find any sentence methods, only for words or smaller selections. Any guidance there? That is an interesting library. – freedomflyer Nov 14 '12 at 20:58
  • Rangy doesn't (yet) do sentence boundary detection, mostly because it's a really hard problem to solve properly in English. Wikipedia has an overview: http://en.wikipedia.org/wiki/Sentence_boundary_disambiguation. I may add a simple implementation in the future. – Tim Down Jan 21 '13 at 15:57
  • @SpencerAllenGardner I've seen you just added a bounty. Can I know what you want that isn't in my existing answer ? – Denys Séguret Jan 25 '13 at 13:25
  • @bmargulies, if only people would type two spaces between sentences! – Samuel Edwin Ward Jan 27 '13 at 00:57

5 Answers5

11

You first would have to split your paragraph into elements, as you can't (easily) detect clicks on text without elements :

$('p').each(function() {
    $(this).html($(this).text().split(/([\.\?!])(?= )/).map(
      function(v){return '<span class=sentence>'+v+'</span>'}
   ));
});

Note that it splits correctly paragraphs like this one :

<p>I love cats! Dogs are fine too... Here's a number : 3.4. Please apply here</p>​

Then you would bind the click :

$('.sentence').click(function(){
    alert($(this).text());
});

Demonstration

I don't know if in English : is a separator between sentences. If so, it can be added to the regex of course.

Ben Rudolph
  • 2,509
  • 2
  • 19
  • 26
Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • 2
    '.' does not always divide a sentence. – bmargulies Nov 14 '12 at 20:43
  • Depends on the problem on which this is applied. For most uses, this would be enough. And this can be tuned (for a little more uses) with a regex. – Denys Séguret Nov 14 '12 at 20:46
  • This will remove the punctuation at the end of the sentence, and that information will be lost. – Justin Morgan - On strike Nov 14 '12 at 21:28
  • 1
    @JustinMorgan Why don't you try it ? You'll see the punctuation isn't removed from the paragraph. A last tuning could be to merge sentences i and i+1 but I won't do it myself given the last of constructive or positive comments I get on this answer. – Denys Séguret Nov 15 '12 at 06:53
  • No, I mean that it will remove punctuation from the results, which are shown in the alert. I did try it before I wrote that comment. It worked just as I said. The visible text of the paragraph isn't altered, but the punctuation is excluded from the span (and treated as a separate sentence, by the way). Since punctuation is part of a sentence, this is incorrect behavior. – Justin Morgan - On strike Nov 15 '12 at 16:08
  • It seems that this will not be at all practical for my use case: a user browsing the web, reading stories, and adding words he/she comes across. It seems like a lot of overhead, in other words. – freedomflyer Jan 25 '13 at 19:05
  • Also, what about text _not_ in

    tags?

    – freedomflyer Jan 25 '13 at 19:14
  • 1
    Then you just have to change the initial selector. You don't need a new answer for that... – Denys Séguret Jan 25 '13 at 19:17
5

First of all, be prepared to accept a certain level of inaccuracy. This may seem simple on the surface, but trying to parse natural languages is an exercise in madness. Let us assume, then, that all sentences are punctuated by ., ?, or !. We can forget about interrobangs and so forth for the moment. Let's also ignore quoted punctuation like "!", which doesn't end the sentence.

Also, let's try to grab quotation marks after the punctuation, so that "Foo?" ends up as "Foo?" and not "Foo?.

Finally, for simplicity, let's assume that there are no nested tags inside the paragraph. This is not really a safe assumption, but it will simplify the code, and dealing with nested tags is a separate issue.

$('p').each(function() {
    var sentences = $(this)
        .text()
        .replace(/([^.!?]*[^.!?\s][.!?]['"]?)(\s|$)/g, 
                 '<span class="sentence">$1</span>$2');
    $(this).html(sentences);
});

$('.sentence').on('click', function() { 
    console.log($(this).text()); 
});​

It's not perfect (for example, quoted punctuation will break it), but it will work 99% of the time.

Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • This works pretty well. Only nitpick is to include the `;` and `:` characters since they are natural writing breaks as well. The regex would then be `/(((?![.!?;:]['"]?\s).)*[.!?;:]['"]?)(\s|$)/g` – iwasrobbed Jul 30 '17 at 22:53
2
  1. Match the sentences. You can use a regex along the lines of /[^!.?]+[!.?]/g for this.
  2. Replace each sentence with a wrapping span that has a click event to alert the entire span.
just.another.programmer
  • 8,579
  • 8
  • 51
  • 90
0

I suggest you take a look at Selection and ranges in JavaScript.

There is not method parse, which can get you the current selected setence, so you have to code that on your own...

A Javascript library for getting the Selection Rang cross browser based is Rangy.

Stefan
  • 14,826
  • 17
  • 80
  • 143
0

Not sure how to get the complete sentense. but you can try this to get word by word if you split each word by spaces.

     <div id="myDiv" onmouseover="splitToSpans(this)" onclick="alert(event.target.innerHTML)">This is a test paragraph. I love cats. Please apply here</div>
function splitToSpans(element){
    if($(element).children().length) 
        return;
    var arr = new Array();
    $($(element).text().split(' ')).each(function(){
    arr.push($('<span>'+this+' </span>'));
    });
    $(element).text('');
    $(arr).each(function(){$(element).append(this);});
}
Akhil Sekharan
  • 12,467
  • 7
  • 40
  • 57