Counting the number of repeated entries in a file

Question

When I read a file, it gives me an output like this:

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749

I want to count the total number of CW and CS in the file. The output should look like this:

3 #For CW 
3 #For CS

I tried using the following code:

with open ("file", 'r') as rf:
    v=rf.read().split('\n')

 i=[]
 for e in v[1::47]: #(only the names)
     r=(e[:12])
     s=(r[:2])
     q= sum(c != ' ' for c in s)
    print(q)

But it gives me this output

I even tried importing counter, but it gives me output like this:

C 1
W 1
C 1
W 1
C 1
S 1

Please suggest some method so that I can get the expected output. Any help will be highly appreciated.

Read the file line by line, and use a dictionary to keep track of the counts. — Tim Biegeleisen, Feb 15 '20 at 06:14
[this](https://stackoverflow.com/questions/1155617/count-the-number-occurrences-of-a-character-in-a-string) and [this](https://stackoverflow.com/questions/8009882/how-to-read-a-large-file-line-by-line) have your answer. — Aven Desta, Feb 15 '20 at 06:23
@TimBiegeleisen I was reading the file using read.split(), on trying realines(), I didn't get the lines eg: CW,CW, which I was previously achieving using read.split(). On using counts for read.split(), it gives me the same answer as I mentioned earlier. — Sukrut Shishupal, Feb 15 '20 at 06:36
@Babydesta I did take a look at those questions before posting my question. Unfortunately those codes are not working for me. — Sukrut Shishupal, Feb 15 '20 at 06:38

Pynchia · Accepted Answer · 2020-02-15T06:53:50.180

2

indeed use Counter

from collections import Counter
with open("xyz.txt") as f:
    c = Counter(line.split()[0] for line in f)
    for k,n in c.items():
        print(k, n)

with an input file of

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042 1
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076 1
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236 1
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213 1
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166 1
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749 1

produces

CW 3
CS 3

edited Feb 15 '20 at 06:53

answered Feb 15 '20 at 06:29

Pynchia

10,996
5
34
43

Thanks, this worked. When I was trying Counter method, I didn't specify the line break in between and that's why it was giving me that answer. Thank you so much. – Sukrut Shishupal Feb 15 '20 at 07:09

abhiarora · Answer 2 · 2020-02-15T06:41:39.433

0

want to count the total number of CW and CS in the file.

Try this:

di = { }
with open("file", "r") as f:
    for l in f:
        l = l.strip()
        di[l] = di[l] + 1 if l in di else 1


for k, v in di.items():
    print("Line: %s and Count: %d" % (k, v))

Output:

Line: CW and Count: 3
Line: CS and Count: 3

edited Feb 15 '20 at 06:41

answered Feb 15 '20 at 06:23

abhiarora

9,743
5
32
57

Thank for the comment, I tried using this and this is the output I got: Line: CW and Count: 1 Line: CW and Count: 1 Line: CW and Count: 1 – Sukrut Shishupal Feb 15 '20 at 06:39
See the output that I have added in my answer. There is something wrong with your file! – abhiarora Feb 15 '20 at 06:42
My file is .DIST file, maybe that's why it wasn't able to give me the answer. – Sukrut Shishupal Feb 15 '20 at 07:10

ASI · Answer 3 · 2020-02-15T07:04:46.323

Python 3.8.1 I hope this will help to. I try to make a functional example code with explications in same time to understand what is happen.

# Global variables
file = "lista.txt"
countDictionary = {}

# Logic Read File
def ReadFile(fileName):
    # Try is optional, is used to track error and to prevent them
    # Also except will be optional because is used on try
    try:
        # Open file in read mode
        with open(fileName, mode="r") as f:
            # Define line
            line = f.readline()
            # For every line in this file
            while line:
                # Get out all white spaces (ex: \n, \r)
                # We will call it item (I asume that CW and CS are some data)
                item = line.strip()[:2]

                # Counting logic
                # Dictionary have at least 2 values I call them data and info
                # Data is like key (name/nickname/id) of the information
                # Info is the value (the information) for this data
                # First will check if data is new and will set info = integer 1
                if item not in countDictionary.keys():
                    countDictionary[item] = 1
                # If is not new will update the count number
                else:
                    info = countDictionary[item]    #will get the curent count number
                    countDictionary[item] = info+1  # will increse the count by one

                # Go to next line by defineing the line again
                # With out that this logic will be on infinite loop just for first line
                line = f.readline()

        # This is optional to. Is callet automatical by python to prevent some errors
        # But I like to be shore
        f.close()

    # In case the file do not exist
    except FileNotFoundError:
        print(f"ERROR >> File \"{fileName}\" do not exist!")

# Execut Function
ReadFile(file)

# Testing dictionary count
for k,j in countDictionary.items():
    print(k, ">>", j)

Console out put:

========================= RESTART: D:\Python\StackOverflow\help.py =========================
CW >> 3
CS >> 3
>>>

File lista.txt:

CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042 1
CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076 1
CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236 1
CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213 1
CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166 1
CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749 1

You shood to reacive. Make shore the file is a name or a path to the file... and after seeing down answers my example will count just same lines because u shoe us just a simple CW and CS text as lines... in this case you need to check just first 2 characters **(update item = line.strip()[:2])**. I will update my code. — ASI, Feb 15 '20 at 07:03

score 0 · Answer 4 · answered Feb 15 '20 at 06:54

0

You can try following code.

>>> text = '''CW  0.000000  0.003822  0.006380  0.005100  0.016987  0.307042
... CW  0.007136  0.019635  0.329683  0.315180  0.302634  0.007076
... CW  0.015666  0.299244  0.290860  0.292623  0.325943  0.005236
... CS  0.022060  0.288761  0.311449  0.289165  0.289937  0.317213
... CS  0.019635  0.040511  0.301167  0.011418  0.295902  0.017166
... CS  0.020990  0.345277  0.352370  0.034237  0.020962  0.015749'''
>>> items = [line.split()[0] for line in text.splitlines()]
>>> val = set([line.split()[0] for line in text.splitlines()])
>>> for item in val:
...     print(f'{items.count(item)} #For {item}')
...
3 #For CW
3 #For CS

answered Feb 15 '20 at 06:54

Adem Öztaş

20,457
4
34
42

AttributeError: '_io.TextIOWrapper' object has no attribute 'splitlines' – Sukrut Shishupal Feb 15 '20 at 06:59
@Dustrokes which python version are you using? https://docs.python.org/3/library/stdtypes.html?highlight=splitlines#bytes.splitlines – Adem Öztaş Feb 15 '20 at 07:18
I'm using python 3.7, will update it, and then try your code. – Sukrut Shishupal Feb 15 '20 at 07:28

Counting the number of repeated entries in a file

4 Answers4