There are a few problems with your code.
Specifically, the method split()
with no arguments splits a string on any white-space rather than line boundaries. splitlines()
will split on lines.
Additionally your code:
word_count += len(wordslist)
is adding the length of the wordslist to the word count for each element of your wordslist
. This is almost definitely not what you want!
Also note that your code:
for char in ': , . ! ?':
data = data.replace(char,' ')
is replacing each character from the supplied string (': , . ! ?':'
) with a space. However because your string of characters contains spaces itself, you are needlessly replacing all spaces in data
with spaces 4 times over. It won't change the results, but it makes your code less efficient.
Instead you could achieve more correct results with code like this:
with open('book.txt','r') as file:
data = file.read()
for char in ':,.!?':
data = data.replace(char,' ')
word_count = len(data.split()) #count of words separated by whitespace
line_count = len(data.splitlines()) #count of lines in data
print(word_count,line_count)
Addendum
It was also asked in comments how to get the character count. Assuming that the character count should count all characters that are not whitespace (tabs, newlines etc) or in the list of special characters, then it could be done with regular expressions:
import re
#original code that stripped out punctuation here
chars_only = re.sub(r"\s+", "", data, flags=re.UNICODE)
char_count = len(chars_only)
re.sub
performs a regular expression substitution, replacing characters that match the expression r"\s+"
(which is the equivalent of "all whitespace characters") with the second argument - an empty string in this case.
However, it should be noted that this char_count
would include any punctuation characters that aren't in the original list of special punctuation characters (such as apostrophes).