Count consecutive occurrences of values in a .txt file

Question

I have a .txt file that has two words repeating in separate lines.

Here is an example. (the actual one is about 80,000 lines long)

ANS
ANS
ANS
AUT
AUT
AUT
AUT
ANS
ANS
ANS
ANS
ANS

I am trying to develop some Python code to count the consecutive lines and return the number of times they repeat. So for this example I would like to return [3,4,5] to another .txt file

word="100011010"
count=1
length=""

for i in range(1, len(word)):

    if word[i-1] == word[i]:
       count += 1

    else:
        length += word[i-1]+" repeats "+str(count)+", "
        count=1

length += ("and "+word[i]+" repeats "+str(count))
print (length)

The concept is similar to the above code for a string. Is there a way to do this with a list?

You can use the exactly same code. Just change `word` with `your_list` :) — Mr. E, Apr 13 '16 at 14:25
sorry, I worded the last question in the wrong way. Right now the information is in a .txt file, would I have to convert that to a list? — slynes, Apr 13 '16 at 14:30
would there be anyway to read it directly from the .txt file? can a certain line be singled from it (such as word[2]). Is there a way to say line[2]? — slynes, Apr 13 '16 at 14:31
@slyness not sure what your application is, but you may also find this applicable: http://stackoverflow.com/questions/24342047/count-consecutive-occurences-of-values-varying-in-length-in-a-numpy-array — pyInTheSky, Apr 13 '16 at 14:43

Mr. E · Answer 1 · 2016-04-13T19:28:17.190

2

You can read the entire file as this:

content = []
with open('/path/to/file.txt', 'r') as file
    content = file.readlines()
    #Maybe you want to strip the lines
    #content = [line.strip() for line in file.readlines()]

Here you have a list with all the lines of the file

def count_consecutive_lines(lines):
    counter = 1
    output = ''
    for index in range(1, len(lines)):
        if lines[index] != lines[index-1]:
            output += '{} repeats {} times.\n'.format(lines[index], counter)
            counter = 1
        counter += 1
   return output

And call this like

print(count_consecutive_lines(content))

edited Apr 13 '16 at 19:28

answered Apr 13 '16 at 14:38

Mr. E

2,070
11
23

thank you I will try this out. Could I do a file.write('\n'.join) to write the counts to another .txt file (\n for a new line for each)? – slynes Apr 13 '16 at 15:08
@slynes Yes, you can – Mr. E Apr 13 '16 at 15:27

totoro · Answer 2 · 2016-04-14T00:51:07.947

1

An answer that doesn't load the whole file into memory:

last = None
count = 0
result = []

with open('sample.txt', 'rb') as f:
    for line in f:
        line = line.strip()
        if line == last:
            count = count + 1
        else:
            if count > 0:
                result.append(count)
            count = 1
            last = line

    result.append(count)
    print result

Result:

[3, 4, 5]

UPDATE

The list contains integers, you can only join strings, so you will have to convert it.

outFile.write('\n'.join(str(n) for n in result))

edited Apr 14 '16 at 00:51

answered Apr 13 '16 at 15:13

totoro

2,469
2
19
23

Thanks this worked great. instead of print at the end I am trying to write the result to a .txt file using outFile.write('\n'.join(result)) (outFile already defined), but it is not working for some reason – slynes Apr 13 '16 at 21:47
@slynes Updated the answer. – totoro Apr 14 '16 at 06:55

score 0 · Answer 3 · answered Apr 13 '16 at 14:34

0

You can try to convert the file data into a list and follow the approach given below:

with open("./sample.txt", 'r') as fl:
    fl_list = list(fl)
    unique_data = set(fl_list)
    for unique in unique_data:
        print "%s - count: %s" %(unique, fl_list.count(unique))

#output:
ANS - count: 8
AUT - count: 4

answered Apr 13 '16 at 14:34

Mani

933
6
15

1

he does not want the total count of a word, rather, the consecutive count of a word, take a look at his example where ans appears in the desired final result, twice. – pyInTheSky Apr 13 '16 at 14:36
1

Not what the OP wanted .. counting consecutive similar items...not counting all similar items – Iron Fist Apr 13 '16 at 14:36
that would be good. but would I be able to get the count separately? for example ANS- count: 3, AUT - count: 4, ANS - count: 5. I need to record how many times AUT appears separately. – slynes Apr 13 '16 at 14:38

João Pedro · Answer 4 · 2016-04-13T14:39:32.243

0

Open your file and read it to count:

l=[]
last=''
with open('data.txt', 'r') as f:
    data = f.readlines()

    for line in data:
        words = line.split()
        if words[0]==last:
            l[-1]=l[-1]+1
            last=words[0]
        else:
            l.append(1)
        if last=='':
            last=words[0]

edited Apr 13 '16 at 14:39

answered Apr 13 '16 at 14:36

João Pedro

546
1
5
19

if condition not syntactically correct, assignment vs comp. – pyInTheSky Apr 13 '16 at 14:37

score 0 · Answer 5 · answered Apr 13 '16 at 14:44

Here is your expected output :)

with open("./sample.txt", 'r') as fl:
    word = list(fl)
    count=1
    length=[]
    for i in range(1, len(word)):
        if word[i-1] == word[i]:
           count += 1
        else:
            length.append(count)
            count=1
    length.append(count)
    print (length)

#output as you excpect:
[3, 4, 5]

Count consecutive occurrences of values in a .txt file

5 Answers5