3

I have a .txt file that has two words repeating in separate lines.

Here is an example. (the actual one is about 80,000 lines long)

ANS
ANS
ANS
AUT
AUT
AUT
AUT
ANS
ANS
ANS
ANS
ANS

I am trying to develop some Python code to count the consecutive lines and return the number of times they repeat. So for this example I would like to return [3,4,5] to another .txt file

word="100011010"
count=1
length=""

for i in range(1, len(word)):

    if word[i-1] == word[i]:
       count += 1

    else:
        length += word[i-1]+" repeats "+str(count)+", "
        count=1

length += ("and "+word[i]+" repeats "+str(count))
print (length)

The concept is similar to the above code for a string. Is there a way to do this with a list?

Cœur
  • 37,241
  • 25
  • 195
  • 267
slynes
  • 31
  • 5
  • 4
    You can use the exactly same code. Just change `word` with `your_list` :) – Mr. E Apr 13 '16 at 14:25
  • sorry, I worded the last question in the wrong way. Right now the information is in a .txt file, would I have to convert that to a list? – slynes Apr 13 '16 at 14:30
  • would there be anyway to read it directly from the .txt file? can a certain line be singled from it (such as word[2]). Is there a way to say line[2]? – slynes Apr 13 '16 at 14:31
  • @slyness not sure what your application is, but you may also find this applicable: http://stackoverflow.com/questions/24342047/count-consecutive-occurences-of-values-varying-in-length-in-a-numpy-array – pyInTheSky Apr 13 '16 at 14:43

5 Answers5

2

You can read the entire file as this:

content = []
with open('/path/to/file.txt', 'r') as file
    content = file.readlines()
    #Maybe you want to strip the lines
    #content = [line.strip() for line in file.readlines()]

Here you have a list with all the lines of the file

def count_consecutive_lines(lines):
    counter = 1
    output = ''
    for index in range(1, len(lines)):
        if lines[index] != lines[index-1]:
            output += '{} repeats {} times.\n'.format(lines[index], counter)
            counter = 1
        counter += 1
   return output

And call this like

print(count_consecutive_lines(content))
Mr. E
  • 2,070
  • 11
  • 23
1

An answer that doesn't load the whole file into memory:

last = None
count = 0
result = []

with open('sample.txt', 'rb') as f:
    for line in f:
        line = line.strip()
        if line == last:
            count = count + 1
        else:
            if count > 0:
                result.append(count)
            count = 1
            last = line

    result.append(count)
    print result

Result:

[3, 4, 5]

UPDATE

The list contains integers, you can only join strings, so you will have to convert it.

outFile.write('\n'.join(str(n) for n in result))
totoro
  • 2,469
  • 2
  • 19
  • 23
  • Thanks this worked great. instead of print at the end I am trying to write the result to a .txt file using outFile.write('\n'.join(result)) (outFile already defined), but it is not working for some reason – slynes Apr 13 '16 at 21:47
  • @slynes Updated the answer. – totoro Apr 14 '16 at 06:55
0

You can try to convert the file data into a list and follow the approach given below:

with open("./sample.txt", 'r') as fl:
    fl_list = list(fl)
    unique_data = set(fl_list)
    for unique in unique_data:
        print "%s - count: %s" %(unique, fl_list.count(unique))

#output:
ANS - count: 8
AUT - count: 4
Mani
  • 933
  • 6
  • 15
  • 1
    he does not want the total count of a word, rather, the consecutive count of a word, take a look at his example where ans appears in the desired final result, twice. – pyInTheSky Apr 13 '16 at 14:36
  • 1
    Not what the OP wanted .. counting consecutive similar items...not counting all similar items – Iron Fist Apr 13 '16 at 14:36
  • that would be good. but would I be able to get the count separately? for example ANS- count: 3, AUT - count: 4, ANS - count: 5. I need to record how many times AUT appears separately. – slynes Apr 13 '16 at 14:38
0

Open your file and read it to count:

l=[]
last=''
with open('data.txt', 'r') as f:
    data = f.readlines()

    for line in data:
        words = line.split()
        if words[0]==last:
            l[-1]=l[-1]+1
            last=words[0]
        else:
            l.append(1)
        if last=='':
            last=words[0]
João Pedro
  • 546
  • 1
  • 5
  • 19
0

Here is your expected output :)

with open("./sample.txt", 'r') as fl:
    word = list(fl)
    count=1
    length=[]
    for i in range(1, len(word)):
        if word[i-1] == word[i]:
           count += 1
        else:
            length.append(count)
            count=1
    length.append(count)
    print (length)

#output as you excpect:
[3, 4, 5]
Mani
  • 933
  • 6
  • 15