get two characters, count and print from a .txt file

Question

The program I wrote is terrible so if anyone can make a different py and give me some pointers well thanks a bunch!

What I would like help on is to print only the first two characters of a word and if those same characters repeat hold them to a count. It's like a Zipf distribution but for the first two letters for every word. The example is how I would like the output. Here is an example from this text and its for show!

text file = "Here is an example from this text and its for show!"

an 2

He 1

is 1

ex 1

fr 1

th 1

te 1

it 1

fo 1

sh 1

total 11

 file = open("C:\python37\paradise.txt", 'r') 

 while 1: 
   
     # read by character 
     char = file.read(2)
     if not char:  
         break
       
     print(char) 

 file.close()

cross-posted at [python-forum.io](https://python-forum.io/Thread-get-two-characters-count-and-print-from-a-txt-file) — buran, Oct 03 '20 at 05:39

Moosa Saadat · Accepted Answer · 2020-10-03T05:56:45.627

You don't have to read character by character because that will require a lot of effort. Here's a better way:

# dictionary to store count of each word (2 characters) eg. "an": 2
wordDict = {}

file = open("paradise.txt", 'r')
# read each line in file
for line in file:
    # read each word in line
    for word in line.split():
        # get only first two letters of word
        word = word[:2]
        # If word is not in dictionary then add it
        if word not in wordDict:
          wordDict[word] = 1
        # else increment the count
        else:
          wordDict[word] += 1

file.close()

# print all values
for key, val in wordDict.items():
  print(key, val)

# print total
print(f"Total: {sum(wordDict.values())}")

Explanation: To store the count, we create a dictionary where keys are the words and values are their counts. E.g:

{
  "an": 2,
  "He": 1
}

Then, we read the file content line by line. We split the line into words and get the first two letters of each word.

Next, we add those words in our dictionary wordDict with their respective counts.

Note: It is suggested to open files using with keyword.

with open('paradise.txt') as file:
    ...

This way, file is closed automatically.

Thanks for any help Moosa, however I received this error with your code wordDict[word] += 1 KeyError: 'of' — Tom E. O'Neil, Oct 03 '20 at 05:54
Sorry, I missed the **not** keyword while updating the code. The condition should be `if word not in wordDict:`. I will correct the mistake — Moosa Saadat, Oct 03 '20 at 05:56

OnceUponATime · Answer 2 · 2020-10-12T08:37:12.083

I would create a result list results=[] to which the two-letter chunks are added. Then I would count the occurrences of unique items in the result list.

Is that what you want?

Here is the link which will help you count unique items in a Python list:

How do I count unique values inside a list

The answer by Vidul is what relates to your question:

"In addition, use collections.Counter to refactor your code:

from collections import Counter

words = ['a', 'b', 'c', 'a']

Counter(words).keys() # equals to list(set(words))
Counter(words).values() # counts the elements' frequency

Output:

['a', 'c', 'b']
[2, 1, 1]

"

The words list in Vidul's example would be results in yours.

score 0 · Answer 3 · edited Oct 03 '20 at 05:50

You can do this simply by loading the whole text file into an array of separate words using space as a delimiter.

Once you have all the words in an array you can loop through the array and only return the first 2 letters of each word using an array operator [:2]. From here you can add each set of 2 to a dictionary and count up.

Example code below that works as well as the output:

filePath = "C:\\Users\\Oddity\\Documents\\Python\\STACKEXCHANGE\\test.txt"

## Read file into list of each word as a seperate entry
data = []
with open(filePath, 'r') as file:
    data = file.read().replace('\n', ' ').split(' ')

    ## Loop through each word and retreive the first two characters
    ## then update the dict
    characters = {}
    for each in data:
        if each[:2] in characters:
            characters[each[:2]] += 1
        else:
            characters[each[:2]] = 1

    print(characters)

    file.close()

Output:

{'He': 2, 'is': 2, 'an': 4, 'ex': 2, 'fr': 2, 'th': 2, 'te': 2, 'it': 2, 'fo': 2, 'sh': 2}

score 0 · Answer 4 · answered Oct 03 '20 at 05:47

file = open("C:\python37\paradise.txt", 'r') 
wordDictionary = {}
keys = []
para = file.read() 
for line in para:
   for word in line,split():
     letters =word[0:2]
     if letters in keys:
        wordDictionary[letters]++
     else:
        keys.append(letters)
        wordDictionary[letters]=1
print (wordDictionary)
file.close()

This is a very simple way in which I open the file, read it and then iterate through every word. Then I get the first 2 letters. Then I check if those are in the keys of the dictionary(I maintaned a separate list). If yes I added one else I created a new record

aahnik · Answer 5 · 2020-10-03T06:18:03.943

0

I hope this would solve :

Click here to copy the code

Output:

 He : 1
 is : 1
 an : 2
 ex : 1
 fr : 1
 th : 1
 te : 1
 it : 1
 fo : 1
 sh : 1

edited Oct 03 '20 at 06:18

answered Oct 03 '20 at 05:51

aahnik

1,661
1
11
29

get two characters, count and print from a .txt file

5 Answers5