Counting every token in wordfile

Question

I am working on a program where i need to count every token(letters,numbers,symbols,and etc) in a text file however when i try to use the len function on the whole file it displays TypeError: object of type '_io.TextIOWrapper' has no len()

My question is basically how would you count every token in a text file

def getCountry(Filename):
    o = open((Filename),'r')

    return o
def menu(Option,Entry):

    if Option == 'A':


        J = len(Entry)

        return J

    if Option == 'B':
        num = 0
        for line in Entry:
            found = sum(line.count(xx) for xx in ('and','del','from','not','while','as','elif', 'global','or','with','assert','else', 'if','pass','yield','break','except','import','print','class','exec','in','rise','continue','finally','is', 'return', 'def', 'for', 'lambda', 'try'))
            num = line.split()
            num2 = len(num)


        Per = ("%.2f" % (found/num2*100))
        Convert = (Per,"Percent")
        return Convert
    if Option == 'C':
        num_chars = 0
        for line in Entry:
            found = sum(line.count(xx)for xx in ('+','-','*','/','.','!','@','#','$','%','^','&','(',')','=','?'))
            num_chars= len(line)
        Per = found/num_chars
        return Per
    if Option == 'E':
        for line in Entry:
            Space = line.count(' ')
        return Space



def main():
    Filename = input('Input filename: ')
    Entry = getCountry(Filename)

    print('Now from the options below choose a letter )# list of choices')
    print('A)Counts the number of tokens (word, symbols, numbers)')
    print('B)Counts the number of selected Python key word (e.g. if, while, …)')
    print('and returns the % of tokens that are key)')
    print('C)Counts the number of selected programming symbols (e.g. +, : , …) and returns the % of tokens that are symbols')
    print('D)Receives the metrics for a program and returns a score that compares the two programs metrics')
    print('E) Counts all of the white spaces in program')
    Option = input('Enter Letter for option: ')
    while Option not in ('A', 'B', 'C','D','E'):#input validation statement
        Option = str(input('Enter Capital Letter: '))
    Answer2 =menu(Option,Entry)
     print(Answer2)

main()

Maybe see: [How to check file size in python?](http://stackoverflow.com/questions/2104080/how-to-check-file-size-in-python) — Peter Wood, Nov 15 '15 at 23:17
A token meaning any character and i have open the file in a previous function and called into my new function opened already however when i try to print it out it gives me TypeError: object of type '_io.TextIOWrapper' has no len() — Jr Sullivan, Nov 16 '15 at 00:10
That's not a token, but a character. You need to convert the `_io.TextIOWrapper` to a `str` by calling `.read()` on the fileobj. Then you can find its length. — 4ae1e1, Nov 16 '15 at 00:28
@4ae1e1 thank you i needed to read the file again. However is there any specific reason it did not convert from when i called it into the main function — Jr Sullivan, Nov 16 '15 at 00:33

score 0 · Accepted Answer · answered Nov 15 '15 at 23:17

0

You need to use the open()- and read()-functions.

for example:

# Your file is called "ex.txt"
openit = open("ex.txt", "r") # open() with "r" means to get "read" privileges.
readit = read(openit)
print len(readit)

I'm guessing that'll give you the result you're looking for, although I don't know if the len() function works correctly for all types of characters (like "À"s and "ö"s etc)

answered Nov 15 '15 at 23:17

Hans VK

124
8

1

`len` works fine on Unicode strings (as returned in Python 3 by that code -- you need `openit.read(()` though, **not** `read(openit)` which will just raise an exception. However `len` returns the number of *characters* while the OP is asking for the number of *tokens*, a very different thing. – Alex Martelli Nov 16 '15 at 00:29

Alex Martelli · Answer 2 · 2015-11-16T00:28:12.883

To count all Python tokens in a file, one compact way is:

import io
import tokenizer

def count_tokens(filename):
    i = 0
    with io.open(filename, 'rb') as f:
        for i, t in enumerate(tokenizer.tokenize(f.readline), 1):
            pass
    return i

You seem to want to do many more diverse things in your program, but this is the one you ask about in the Q's subject.

The tokenizer.tokenize generator yields 5-tuples starting with the token type, followed by the token string, two pairs giving the coordinates (row, column) of the start and end of the token, and finally the logical line where the token occurs (it's a named tuple, so you can access the 5 items with symbolic names, too).

Some of your other tasks might benefit from that, just needing some inspection of each token -- maybe filtering the tokens generator with a predicate via filter (or in Python 2.7 itertools.ifilter).

However, best practice for StackOverflow is "one question per question", so I would encourage you to work based on these hints and, if something specifically (other than just counting all tokens) stumps you, open a separate question -- showing what code you have, what you expect from it, what's happening instead (on a small example file).

Counting every token in wordfile

2 Answers2