Python count words of split sentence?

Question

Not sure how to remove the "\n" thing at the end of output

Basically, i have this txt file with sentences such as:

"What does Bessie say I have done?" I asked.

"Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child 
 taking up her elders in that manner.
 
Be seated somewhere; and until you can speak pleasantly, remain silent."

I managed to split the sentences by semicolon with code:

import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
    low = word.lower()
    re.split(';',low)

But not sure how to count the words of the split sentences as len() doesn't work: The output of the sentences:

['"what does bessie say i have done?" i asked.\n']
['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a 
child taking up her elders in that manner.\n']
['be seated somewhere', ' and until you can speak pleasantly, remain silent."\n']

The third sentence for example, i am trying to count the 3 words at left and 8 words at right.

Thanks for reading!

Can't you just split by blank and get the length of the resulting list? — Gustave Coste, Sep 10 '20 at 08:13
Does this answer your question? [Count Words in Python](https://stackoverflow.com/questions/29525601/count-words-in-python) — Corentin Pane, Sep 10 '20 at 08:13
regexes have things like \b and \w too that might help you. And you should give an example of what you are aiming for as a result for such data. — JGFMK, Sep 10 '20 at 09:00

Aviv Yaniv · Answer 1 · 2020-09-10T08:34:41.310

The number of words is the number of spaces plus one:

e.g. Two spaces, three words:

World is wonderful

Code:

import re
import string

lines = []
with open('file.txt', 'r') as f:
    lines = f.readlines()

DELIMETER = ';'
word_count = []
for i, sentence in enumerate(lines):
    # Remove empty sentance
    if not sentence.strip():
        continue
    # Remove punctuation besides our delimiter ';'
    sentence = sentence.translate(str.maketrans('', '', string.punctuation.replace(DELIMETER, '')))
    # Split by our delimeter
    splitted = re.split(DELIMETER, sentence)
    # The number of words is the number of spaces plus one
    word_count.append([1 + x.strip().count(' ') for x in splitted])

# [[9], [7, 9], [7], [3, 8]]
print(word_count)

pajamas · Answer 2 · 2020-09-10T09:07:29.620

You'll need the library nltk

from nltk import sent_tokenize, word_tokenize

mytext = """I have a dog. 
The dog is called Bob."""

for sent in sent_tokenize(mytext): 
    print(len(word_tokenize(sent)))

Output

5
6

Step by step explanation:

for sent in sent_tokenize(mytext): 
    print('Sentence >>>',sent) 
    print('List of words >>>',word_tokenize(sent)) 
    print('Count words per sentence>>>', len(word_tokenize(sent)))

Output:

Sentence >>> I have a dog.
List of words >>> ['I', 'have', 'a', 'dog', '.']
Count words per sentence>>> 5
Sentence >>> The dog is called Bob.
List of words >>> ['The', 'dog', 'is', 'called', 'Bob', '.']
Count words per sentence>>> 6

score 0 · Answer 3 · answered Sep 10 '20 at 08:36

Use str.rstrip('\n') to remove the \n at the end of each sentence.

To count the words in a sentence, you can use len(sentence.split(' '))

To transform a list of sentences into a list of counts, you can use the map function.

So here it is:

import re

with open("testing.txt") as file:
    for i, line in enumerate(file.readlines()):
        # Ignore empty lines
        if line.strip(' ') != '\n':
            line = line.lower()
            # Split by semicolons
            parts = re.split(';', line)
            print("SENTENCES:", parts)
            counts = list(map(lambda part: len(part.split()), parts))
            print("COUNTS:", counts)

Outputs

SENTENCES: ['"what does bessie say i have done?" i asked.']
COUNTS: [9]
SENTENCES: ['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a child ']
COUNTS: [7, 9]
SENTENCES: [' taking up her elders in that manner.']
COUNTS: [7]
SENTENCES: ['be seated somewhere', ' and until you can speak pleasantly, remain silent."']
COUNTS: [3, 8]

score 0 · Answer 4 · answered Sep 10 '20 at 09:53

`

import re
sentences = []                                                   #empty list for storing result
with open('testtext.txt') as fileObj:
    lines = [line.strip() for line in fileObj if line.strip()]   #makin list of lines allready striped from '\n's
for line in lines:
    sentences += re.split(';', line)                             #spliting lines by ';' and store result in sentences
for sentence in sentences:
    print(sentence +' ' + str(len(sentence.split())))            #out

Cuong DaoVan · Answer 5 · 2020-09-10T08:23:19.970

-1

try this one:

import re
  with open("testing.txt") as file:
  read_file = file.readlines()
  for i, word in enumerate(read_file):
  low = word.lower()
  low = low.strip()
  low = low.replace('\n', '')
  re.split(';',low)

edited Sep 10 '20 at 08:23

answered Sep 10 '20 at 08:20

Cuong DaoVan

262
2
6

2

Why `strip` twice and _then_ remove `\n`? Also, the result of `re.split` does not go anywhere. – tobias_k Sep 10 '20 at 08:21

Python count words of split sentence?

5 Answers5