-4

I am fairly new to files in python and want to find the words in a file that have say 8 letters in them, which prints them, and keeps a numerical total of how many there actually are. Can you look through files like if it were a very large string or is there a specific way that it has to be done?

MoneyMaker12
  • 3
  • 1
  • 2
  • 5
    So, you read the file, for example, line by line, split each line into words, get length of each word, print and increase the counter accordingly. Which step are you having troubles with? – bereal Feb 21 '16 at 17:36
  • Splitting them line by line and getting the length – MoneyMaker12 Feb 21 '16 at 17:39
  • Files are iterable, so once you open a file, you can iterate through lines as `for line in file:`, splitting by words is done as `line.split()`, word length is `len(word)`. – bereal Feb 21 '16 at 17:47

3 Answers3

2

You could use Python's Counter for doing this:

from collections import Counter
import re

with open('input.txt') as f_input:
    text = f_input.read().lower()
    words = re.findall(r'\b(\w+)\b', text)
    word_counts = Counter(w for w in words if len(w) == 8)

    for word, count in word_counts.items():
        print(word, count)

This works as follows:

  1. It reads in a file called input.txt, as one very long string.

  2. It then converts it all to lowercase to make sure the same words with different case are counted as the same word.

  3. It uses a regular expression to split all of the text into a list of words.

  4. It uses a list comprehension to store any word that has a length of 8 characters into a Counter.

  5. It displays all of the matching entries along with the counts.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
1

Try this code, where "eight_l_words" is an array of all the eight letter words and where "number_of_8lwords" is the number of eight letter words:

 # defines text to be used
 your_file = open("file_location","r+")
 text = your_file.read

 # divides the text into lines and defines some arrays
 lines = text.split("\n")
 words = []
 eight_l_words = []

 # iterating through "lines" adding each separate word to the "words" array
 for each in lines:
     words += each.split(" ")

 # checking to see if each word in the "words" array is 8 chars long, and if so
 # appending that words to the "eight_l_word" array
 for each in words:
     if len(each) == 8:
         eight_l_word.append(each)

 # finding the number of eight letter words
 number_of_8lwords = len(eight_l_words)

 # displaying results
 print(eight_l_words)
 print("There are "+str(number_of_8lwords)+" eight letter words")

Running the code with

 text = "boomhead shot\nshamwow slapchop"

Yields the results:

 ['boomhead', 'slapchop']
 There are 2 eight letter words
Mushroom Man
  • 400
  • 3
  • 18
0

There's a useful post from 2 years ago called "How to split a text file to its words in python?"

How to split a text file to its words in python?

It describes splitting the line by whitespace. If you got punctuation such as commas and fullstops in there then you'll have to be a bit more sophisticated. There's help here: "Python - Split Strings with Multiple Delimiters" Split Strings with Multiple Delimiters?

You can use the function len() to get the length of each individual word.

Community
  • 1
  • 1
DMG
  • 1
  • 2