-1

Not sure how to remove the "\n" thing at the end of output

Basically, i have this txt file with sentences such as:

"What does Bessie say I have done?" I asked.

"Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child 
 taking up her elders in that manner.
 
Be seated somewhere; and until you can speak pleasantly, remain silent."

I managed to split the sentences by semicolon with code:

import re
with open("testing.txt") as file:
read_file = file.readlines()
for i, word in enumerate(read_file):
    low = word.lower()
    re.split(';',low)

But not sure how to count the words of the split sentences as len() doesn't work: The output of the sentences:

['"what does bessie say i have done?" i asked.\n']
['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a 
child taking up her elders in that manner.\n']
['be seated somewhere', ' and until you can speak pleasantly, remain silent."\n']

The third sentence for example, i am trying to count the 3 words at left and 8 words at right.

Thanks for reading!

pajamas
  • 1,194
  • 1
  • 12
  • 25
Peanut Jams
  • 374
  • 2
  • 9

5 Answers5

0

The number of words is the number of spaces plus one:

e.g. Two spaces, three words:

World is wonderful

Code:

import re
import string

lines = []
with open('file.txt', 'r') as f:
    lines = f.readlines()

DELIMETER = ';'
word_count = []
for i, sentence in enumerate(lines):
    # Remove empty sentance
    if not sentence.strip():
        continue
    # Remove punctuation besides our delimiter ';'
    sentence = sentence.translate(str.maketrans('', '', string.punctuation.replace(DELIMETER, '')))
    # Split by our delimeter
    splitted = re.split(DELIMETER, sentence)
    # The number of words is the number of spaces plus one
    word_count.append([1 + x.strip().count(' ') for x in splitted])

# [[9], [7, 9], [7], [3, 8]]
print(word_count)
Aviv Yaniv
  • 6,188
  • 3
  • 7
  • 22
0

You'll need the library nltk

from nltk import sent_tokenize, word_tokenize

mytext = """I have a dog. 
The dog is called Bob."""

for sent in sent_tokenize(mytext): 
    print(len(word_tokenize(sent)))

Output

5
6

Step by step explanation:

for sent in sent_tokenize(mytext): 
    print('Sentence >>>',sent) 
    print('List of words >>>',word_tokenize(sent)) 
    print('Count words per sentence>>>', len(word_tokenize(sent))) 

Output:

Sentence >>> I have a dog.
List of words >>> ['I', 'have', 'a', 'dog', '.']
Count words per sentence>>> 5
Sentence >>> The dog is called Bob.
List of words >>> ['The', 'dog', 'is', 'called', 'Bob', '.']
Count words per sentence>>> 6
pajamas
  • 1,194
  • 1
  • 12
  • 25
0

Use str.rstrip('\n') to remove the \n at the end of each sentence.

To count the words in a sentence, you can use len(sentence.split(' '))

To transform a list of sentences into a list of counts, you can use the map function.

So here it is:

import re

with open("testing.txt") as file:
    for i, line in enumerate(file.readlines()):
        # Ignore empty lines
        if line.strip(' ') != '\n':
            line = line.lower()
            # Split by semicolons
            parts = re.split(';', line)
            print("SENTENCES:", parts)
            counts = list(map(lambda part: len(part.split()), parts))
            print("COUNTS:", counts)

Outputs

SENTENCES: ['"what does bessie say i have done?" i asked.']
COUNTS: [9]
SENTENCES: ['"jane, i don\'t like cavillers or questioners', ' besides, there is something truly forbidding in a child ']
COUNTS: [7, 9]
SENTENCES: [' taking up her elders in that manner.']
COUNTS: [7]
SENTENCES: ['be seated somewhere', ' and until you can speak pleasantly, remain silent."']
COUNTS: [3, 8]
Quan To
  • 697
  • 3
  • 10
0

`

import re
sentences = []                                                   #empty list for storing result
with open('testtext.txt') as fileObj:
    lines = [line.strip() for line in fileObj if line.strip()]   #makin list of lines allready striped from '\n's
for line in lines:
    sentences += re.split(';', line)                             #spliting lines by ';' and store result in sentences
for sentence in sentences:
    print(sentence +' ' + str(len(sentence.split())))            #out
-1

try this one:

import re
  with open("testing.txt") as file:
  read_file = file.readlines()
  for i, word in enumerate(read_file):
  low = word.lower()
  low = low.strip()
  low = low.replace('\n', '')
  re.split(';',low)
Cuong DaoVan
  • 262
  • 2
  • 6