1

My code is behaving strangely, and I have a feeling it has to do with the regular expressions i'm using.

I'm trying to determine the number of total words, number of unique words, and number of sentences in a text file.

Here is my code:

import sys
import re

file = open('sample.txt', 'r')


def word_count(file):
    words = []
    reg_ex = r"[A-Za-z0-9']+"
    p = re.compile(reg_ex)
    for l in file:
        for i in p.findall(l):
            words.append(i)
    return len(words), len(set(words))

def sentence_count(file):
    sentences = []
    reg_ex = r'[a-zA-Z0-9][.!?]'
    p = re.compile(reg_ex)
    for l in file: 
        for i in p.findall(l):
            sentences.append(i)
    return sentences, len(sentences)

sentence, sentence_count = sentence_count(file)
word_count, unique_word_count = word_count(file)

print('Total word count:  {}\n'.format(word_count) + 
    'Unique words:  {}\n'.format(unique_word_count) + 
'Sentences:  {}'.format(sentence_count))

The output is the following:

Total word count:  0
Unique words:  0
Sentences:  5

What is really strange is that if I comment out the sentence_count() function, the word_count() function starts working and outputs the correct numbers.

Why is this inconsistency happening? If I comment out either function, one will output the correct value while the other will output 0's. Can someone help me such that both functions work?

liz
  • 83
  • 7

2 Answers2

1

The issue is that you can only iterate over an open file once. You need to either reopen or rewind the file to iterate over it again.

For example:

with open('sample.txt', 'r') as f:
  sentence, sentence_count = sentence_count(f)
with open('sample.txt', 'r') as f:
  word_count, unique_word_count = word_count(f)

Alternatively, f.seek(0) would rewind the file.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • thanks! I definitely didn't know that you can only iterate over an open file once -- really helpful to know. – liz Jul 31 '18 at 19:52
0

Make sure to open and close your file properly. One way you can do this is by saving all the text first.

with open('sample.txt', 'r') as f:
    file = f.read()

The with statement can be used to open and safely close the file handle. Since you would have extracted all the contents into file, you don't need the file open anymore.

Sunny Patel
  • 7,830
  • 2
  • 31
  • 46