-2

What is a functional way to count the number of characters in a file?

I would like to avoid using any modules, if possible. Due to my technical limitations and requirements, as much optimization as possible is required - and that includes omitting modules, since they take too much time to load in my situation (even though the load time is trivial). Thanks!

miike3459
  • 1,431
  • 2
  • 16
  • 32

1 Answers1

4

Well, this is actually quite easy to do. First I'll cover counting words (which can be used with a file or just a string, but for purpose I'll be counting words in a file).

Let's pretend that this is the content of our file ourfile.txt:

Hello. This is a file.

Not the most exciting file.
Just be glad it isn't lorem ipsum.

Let's start by defining our function and reading from our text file:

def countWordsFile(file):
   with open(file) as f:
      query = file.read()

Great, now we have the file's content! Next we're going to define a new variable called n_split. This variable will split the string at every \n. But what happens is when we have line breaks, it can generate empty list values. So we use a simple filter function to remove empty list values:

n_split = list(filter(None, query.split('\n'))) 
# Splits at every \n, and removes empty list values caused by line breaks

When we remove all empty list items, n_split now looks like this:

['Hello. This is a file.', 'Not the most exciting file.', 'Just be glad it isn't lorem ipsum.'] 

For comparison, if we hadn't removed empty list items, n_split would look like this:

['Hello. This is a file.', '', 'Not the most exciting file.', 'Just be glad it isn't lorem ipsum.'] 

...because of that one line break included in the file. It's just a good feature to have so we can reduce possibility of errors.

Now our variable n_split contains every paragraph in the file, without any empty list items. Next is splitting up each paragraph into each individual word. To do this, we can simply iterate over every item in n_split and split it at the spaces:

words = []
for i in n_split:
   words.append(i.split(' '))

But we still have one last step. See, because we split every list item that's already within a list, now we actually have a list that contains individual lists of every word for each paragraph. So we just have to combine them into one larger list. We can use a simple list comprehension expression for that (credit). We can find the len of this generated list and just return it:

return len([x for y in words for x in y])

And now we can access that value at any time just by calling print(countWordsFile('ourfile.txt')) and get the output:

17

The exact number of words in the file! We have accomplished our goal. It's actually very simple to get the amount of characters in a string: just call len(string). Using len is a double-edged sword: it can return the length of a list or the length of a string. To wrap up, I believe this is a very simple and foolproof way to be able to count characters or words in a string or from a file. I hope you learned something from this guide!

miike3459
  • 1,431
  • 2
  • 16
  • 32
  • 2
    You don't use a generator function here, but a list-comprehension. Anyways, nice explanation. :) – Austin Oct 13 '18 at 13:46
  • why not simply: `return len( [x for x in f.read().split() if x.strip()] )` ? – Patrick Artner Oct 13 '18 at 13:49
  • @PatrickArtner While that is a great idea, this is more of a beginners tutorial (I broke it down in parts easier to understand). And, this longer method is easier to manipulate. – miike3459 Oct 13 '18 at 13:51
  • @PatrickArtner That's interesting because I have tried this multiple times using just that, and it is absolutely fine. https://repl.it/@xMikee/Word-Counter – miike3459 Oct 13 '18 at 13:54
  • this is a listcomp: `return len([x for y in words for x in y])` ... this is a generator comp: `return len( (x for y in words for x in y) )` - the latter does not work - you are always using list comps – Patrick Artner Oct 13 '18 at 13:56
  • Oh, yes. I did edit the post accordingly. Sorry, I got mixed up between generators and list comps. – miike3459 Oct 13 '18 at 13:57
  • 1
    @PatrickArtner, Yes, generator comps don't have `len()`. OP actually used list-comprehension but worded it as a generator comp. – Austin Oct 13 '18 at 14:00