Finding longest word in a txt file

Question

I am trying to create a function in which a filename is taken as a parameter and the function returns the longest word in the file with the line number attached to the front of it. This is what I have so far but it is not producing the expected output I need.

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            return (str(line_num) + ": " + str(longest_word))

Please include your sample input, actual output and expected output. Though cursory glance, you have your ```return``` statement inside the ```for line in lines:``` loop which will exit right after the 1st iteration. — ewokx, Jun 01 '22 at 00:24
Need some more details. How the text is formatted in the txt (1 big block, multlines...)? How words are separated (whitespace, ","...)? — Drakax, Jun 01 '22 at 00:26
It would generally be better to return a tuple of `(line_num, longest_word)` and let the caller format that as needed. Also, why return None if a line is blank? — jarmod, Jun 01 '22 at 00:35
Side-note: There is almost never a need to call `f.readlines()`. Instead of doing `lines = f.readlines()`, then doing `for line in lines:`, just do `for line in f:`; files are lazy iterators over their lines, and iterating the lines from the file object directly means you only need to store one line at a time, so your memory usage is proportionate to the longest line in the file, not the total file size (for a multi-GB file, the difference could easily make or break your program). — ShadowRanger, Jun 01 '22 at 00:35

hc_dev · Answer 1 · 2022-06-01T01:28:12.620

Issue

Exactly what ewong diagnosed:

last return statement is too deep indented

Currently:

the longest word in the first line only

Solution

Should be aligned with the loop's column, to be executed after the loop.

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            # return here would exit the loop too early after 1st line
        # loop ended
        return (str(line_num) + ": " + str(longest_word))

Then:

the longest word in the file with the line number attached to the front of it.

Improved

def word_finder(file_name):
    with open(file_name) as f:
        line_word_longest = None  # global max: tuple of (line-index, longest_word)
        for i, line in enumerate(f):  # line-index and line-content
            line = line.strip()
            if len(line) > 0:   # split words only if line present    
                max_token = max(token for token in line.split(), key = len)  # generator then max of tokens by length
                if line_word_longest is None or len(max_token) > len(line_word_longest[1]):
                    line_word_longest = (i, max_token)
        # loop ended
        if line_word_longest is None:
            return "No longest word found!"
        return f"{line_word_longest[0]}: '{line_word_longest[1]}' ({len(line_word_longest[1])} chars)"

See also:

Some SO research for similar questions:

inspiration from all languages: longest word in file
only python: [python] longest word in file
non python: -[python] longest word in file

In improved code did you mean: `max_token = max([token for token in line.split()], key = len)` # list-comprehension then max of tokens. Even better is to use a generator rather than list-comprehension: `max_token = max(token for token in line.split(), key = len)` — DarrylG, Jun 01 '22 at 01:00
@DarrylG thanks, not what we think (longest = max token), but what we measure (longest = max length) — hc_dev, Jun 01 '22 at 01:09

score 0 · Answer 2 · answered Jun 01 '22 at 00:38

0

I think this is the shortest way to find the word, correct if not

def wordFinder(filename):
    with open(filename, "r") as f:
        words = f.read().split() # split() returns a list with the words in the file
        longestWord = max(words, key = len) # key = len returns the word size
        print(longestWord) # prints the longest word

answered Jun 01 '22 at 00:38

x07ex

9
1
1

1

OP needs line number. – jarmod Jun 01 '22 at 00:39
The application of _max by length_ function is elegant! – hc_dev Jun 01 '22 at 00:55

Finding longest word in a txt file

2 Answers2

Issue

Solution

Improved