Well, this is actually quite easy to do. First I'll cover counting words (which can be used with a file or just a string, but for purpose I'll be counting words in a file).
Let's pretend that this is the content of our file ourfile.txt
:
Hello. This is a file.
Not the most exciting file.
Just be glad it isn't lorem ipsum.
Let's start by defining our function and reading from our text file:
def countWordsFile(file):
with open(file) as f:
query = file.read()
Great, now we have the file's content! Next we're going to define a new variable called n_split
. This variable will split the string at every \n
. But what happens is when we have line breaks, it can generate empty list values. So we use a simple filter
function to remove empty list values:
n_split = list(filter(None, query.split('\n')))
# Splits at every \n, and removes empty list values caused by line breaks
When we remove all empty list items, n_split
now looks like this:
['Hello. This is a file.', 'Not the most exciting file.', 'Just be glad it isn't lorem ipsum.']
For comparison, if we hadn't removed empty list items, n_split
would look like this:
['Hello. This is a file.', '', 'Not the most exciting file.', 'Just be glad it isn't lorem ipsum.']
...because of that one line break included in the file. It's just a good feature to have so we can reduce possibility of errors.
Now our variable n_split
contains every paragraph in the file, without any empty list items. Next is splitting up each paragraph into each individual word. To do this, we can simply iterate over every item in n_split
and split it at the spaces:
words = []
for i in n_split:
words.append(i.split(' '))
But we still have one last step. See, because we split every list item that's already within a list, now we actually have a list that contains individual lists of every word for each paragraph. So we just have to combine them into one larger list. We can use a simple list comprehension expression for that (credit). We can find the len
of this generated list and just return
it:
return len([x for y in words for x in y])
And now we can access that value at any time just by calling print(countWordsFile('ourfile.txt'))
and get the output:
17
The exact number of words in the file! We have accomplished our goal. It's actually very simple to get the amount of characters in a string: just call len(string)
. Using len
is a double-edged sword: it can return the length of a list or the length of a string. To wrap up, I believe this is a very simple and foolproof way to be able to count characters or words in a string or from a file. I hope you learned something from this guide!