1

I want to get some sentences from a text.

Sample text is following,

Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.

What I've done so far is I'm able to get 30 words from a large text but at the end, I've got an incomplete sentence and I want to remove such sentence.

Here is the function to get 30 words,

/**
 * @param $sentence
 * @param int $count
 * @return mixed
 */
function get_words($sentence, $count = 30) {
    preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
    return $matches[0];
}

I've used above function from the question below

How to select first 10 words of a sentence?

When I pass above text to the function I've got output like this,

Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a

Here the last sentence is incomplete and I don't want such in my output.

Is there any way to achieve this?

I'm working with PHP and Laravel any kind of help and suggestions are appreciated.

Sayed Mohd Ali
  • 2,156
  • 3
  • 12
  • 28
Sagar Gautam
  • 9,049
  • 6
  • 53
  • 84
  • What you're trying to do is pretty complex, I would say - unless you expect some kind of recurrence of the incomplete sentences in the input? Otherwise, I would suggest taking a look at Natural Language Processing software (Spacy is a fast example) - that type of software can help you to dissect these sentences, get tokens and determine if there's enough in a sentence to be a full sentence. – T. Altena Jan 03 '19 at 07:36
  • @T.Altena Thanks, for your response. I also thought that but for not so important task implementing NLP is tough one. I think so. I'm looking for some programming tweak that i can achieve quite similar that – Sagar Gautam Jan 03 '19 at 07:38
  • You can look at the answers below for Regex approximations of sentence endings. Beware though - if you are going to examine punctuation, abbreviations might throw you off (Prof. X said Y... ), and not all sentences end with a dot ! – T. Altena Jan 03 '19 at 07:43
  • @T.Altena I've got one solution for my case – Sagar Gautam Jan 03 '19 at 07:46

2 Answers2

1

This below code may help you.

<?php
$sen="Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.";
$cropped_data =  get_words($sen);
$strlength = strlen ( $cropped_data );
$remains=  complete_sentence(substr($sen,$strlength));

function complete_sentence($content) {
    $pos = strpos($content, '.');
    return substr($content, 0, $pos+1);
}

function get_words($sentence, $count = 30) {
    preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
    return $matches[0];
}

echo "complete sentence<br/>".$cropped_data.$remains;
?>

Thanks.

Ramya
  • 199
  • 10
  • Works perfectly as I want. Thanks – Sagar Gautam Jan 03 '19 at 07:42
  • while calling `complete_sentence()`, only resulted `$cropped_data` should be sent again, rather than `$sen` in substr's first parameter, else if `.` exists in a `$sen` after count of 30 words, it will return result up to that point. Resulting finally in more than 30 count of words. Also there will be no benefit of evaluating `$cropped_data` if it is not used proceeding forward. I have made an edit to answer, I hope it gets accepted. – Anant Jan 03 '19 at 07:46
0
function get_words($sentence, $count = 30) {

  preg_match("/(?:w+(?:W+|$)){0,$count}/", $sentence, $matches);

  return $matches[0];
};

$sentence = "Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though.";

$cropSentence =  get_words($sentence);

$finalSentence= substr($cropSentence, 0, strrpos($cropSentence, "."));
echo $finalSentence;

It will return until the last occurrence of (.);

Gryphon interrupted in a low voice. 'Not at all,' said the Dodo, pointing to the confused clamour of the wood--(she considered him to you, Though

Md.Sukel Ali
  • 2,987
  • 5
  • 22
  • 34