Ordering sentences according to their length

Question

I tried using this code that I found online:

K=sentences
m=[len(i.split()) for i in K]

lengthorder= sorted(K, key=len, reverse=True)
#print(lengthorder)

#print("\n")

list1 = lengthorder
str1 = '\n'.join(list1)

print(str1)

print('\n')

Sentence1 = "We have developed speed, but we have shut ourselves in"
res = len(Sentence1.split())
print ("The longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")

Sentence2 = "More than cleverness we need kindness and gentleness"
res = len(Sentence2.split())
print ("The second longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")

Sentence3 = "Machinery that gives abundance has left us in want"
res = len(Sentence3.split())
print ("The third longest sentence in this text contains" + ' ' + str(res) + ' ' + "words.")

but it doesn't sort out the sentences per word number, but per actual length (as in cm)

writing an example of your code and your data can be helpful. share more details like an input text. are the input text separate or all of them are in one text so you have to first split them then sort base len? — zana saedpanah, Nov 30 '21 at 13:01
@maria, it'll be more helpful if you put these commends in the original post as edits. The code especially is hard to read in the comment here. You can just hit 'edit' on the original post and put it all in there. — scotscotmcc, Nov 30 '21 at 13:07
it's better to edit your question and add your data in there, not in the comment. — zana saedpanah, Nov 30 '21 at 13:07
I am sorry, I am very new here and uncertain how things work. Thank you!! — maria, Nov 30 '21 at 13:10
Write a function that determines how many words are in a sentence. — Kenny Ostrom, Nov 30 '21 at 13:32
Sentences in English end with a period, an exclamation mark, or a question mark. The text you've shown has none of these. — jarmod, Nov 30 '21 at 13:37
If you're new, you should take the [tour] and have a look at [ask]. It is important to provide a [mre] when asking for help with your code — Tomerikoo, Nov 30 '21 at 14:39

zana saedpanah · Answer 1 · 2021-11-30T18:39:18.717

Let's assume that your sentences are already separate and there is no need to detect the sentences. So we have a list of sentences. Then we need to calculate the length of the sentence based on the word count. the basic way is to split them by space. So each space separates two words from each other in a sentence.

list_of_sen = ['We have developed speed, but we have shut ourselves in','Machinery that gives abundance has left us in want Our knowledge has made us cynical Our cleverness',   'hard and unkind We think too much and feel too little More than machinery we need humanity More than cleverness we need kindness and gentleness']

sen_len=[len(i.split()) for i in list_of_sen]

sen_len= sorted(sen_len, reverse=True)

for  index , count in enumerate(sen_len):
  print(f'The {index+1} longest  sentence in this text contains {count} words')

But if your sentence is not separated, first we need to recognize the end of the sentence then split them. Your sample date does not contain any punctuation that can be useful to separate sentences. So if we assume that your data has punctuation the answer below can be helpful. see this question

from nltk import tokenized
p = "Good morning Dr. Adams. The patient is waiting for you in room number 3."
tokenize.sent_tokenize(p)

score 0 · Answer 2 · answered Nov 30 '21 at 13:40

You can simply iterate through the different sentaces and split them up into words like this:

text = " We have developed speed. but we have. shut ourselves in Machinery that. gives abundance has left us in want Our knowledge has made us cynical Our cleverness, hard and unkind We think too much and feel too little More than machinery we need humanity More than cleverness we need kindness and gentleness"
# split into sentances
text2array = text.split(".")
i =0
# interate through sentances and split them into words
for sentance in text2array:
    text2array[i] = sentance.split(" ")
    i += 1

# sort the sentances by word length
text2array.sort(key=len,reverse=True)

i = 0

#iterate through sentances and print them to screen
for sentance in text2array:
    i += 1
    sentanceOut = ""
    for word in sentance:
        sentanceOut += " " + word
    sentanceOut += "."
    print("the nr "+ str(i) +" longest sentence is" + sentanceOut)

score 0 · Answer 3 · answered Nov 30 '21 at 14:29

You can define a function that uses the regex to obtain the number of words in a given sentence:

import re

def get_word_count(sentence: str) -> int:
    return len(re.findall(r"\w+", sentence))

Assuming you already have a list of sentences, you can iterate the list and pass each sentence to the word count function then store each sentence and its word count in a dictionary:

sentences = [
    "Assume that this sentence has one word. Really?",
    "Assume that this sentence has more words than all sentences in this list. Obviously!",
    "Assume that this sentence has more than one word. Duh!",
]

word_count_dict = {}

for sentence in sentences:
    word_count_dict[sentence] = get_word_count(sentence)

At this point, the word_count_dict contains sentences as keys and their associated word count as values.

You can then sort word_count_dict by values:

sorted_word_count_dict = dict(
    sorted(word_count_dict.items(), key=lambda item: item[1], reverse=True)
)

Here's the full snippet:

import re


def get_word_count(sentence: str) -> int:
    return len(re.findall(r"\w+", sentence))


sentences = [
    "Assume that this sentence has one word. Really?",
    "Assume that this sentence has more words than all sentences in this list. Obviously!",
    "Assume that this sentence has more than one word. Duh!",
]

word_count_dict = {}

for sentence in sentences:
    word_count_dict[sentence] = get_word_count(sentence)

sorted_word_count_dict = dict(
    sorted(word_count_dict.items(), key=lambda item: item[1], reverse=True)
)

print(sorted_word_count_dict)

Ordering sentences according to their length

3 Answers3