generate string with length equal to length of time in file, with 1 label per second , python

Question

I have a file like this:

https://gist.github.com/manbharae/70735d5a7b2bbbb5fdd99af477e224be

What I want to do is generate 1 label for 1 second.

Since this above file is 160 seconds long, there should be 160 labels. in other words I want to generate string of length 160.

However I'm ending up having an str of len 166 instead of 160.

My code :

 filename = './test_file.txt'
    ann = []

    with open(filename, 'r') as f:
        for line in f:
            _, end, label = line.strip().split('\t')
            ann.append((int(float(end)), 'MIT' if label == 'MILAN' else 'not-MIT'))

    str = ''
    prev_value = 0
    for s in ann:
        value = s[0]
        letter = 'M' if s[1] == 'MIT' else 'x'
        str += letter * (value - prev_value)
        print str
        prev_value = value

    name_of_file, file_ext = os.path.splitext(os.path.basename(filename))
    print "\n\nfile_name processed:", name_of_file
    print str
    print "length of string", len(str),"\n\n"

My final output:

xxxxxxxMxMMMMxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxMMMMMMMMMMMMMMMMMMMMxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

166.

Which is wrong. Str should be 160characters with each character per second, because file is 160 seconds long.

There is some small bug somewhere, unable to find it. Please advise what's going wrong here?

Thanks.

Few things that I tried were , trying to include an if condition to break out of the loop once length of 160 is reached like this:

if ann[len(ann)-1][0] == len(str):
                    break;

AFAIK, something is going wrong in the last iteration, because until then everything is fine.

however it didn't help. I looked at : https://stackoverflow.com/a/14927311/4932791
https://stackoverflow.com/a/1424016/4932791

I think it would be really helpful if you could share your original input file as well - test_file.txt — GSazheniuk, Feb 27 '18 at 15:21
sorry , I had just fixed the typo. The original input file is in the gist link on line 2. once again here it is https://gist.githubusercontent.com/manbharae/70735d5a7b2bbbb5fdd99af477e224be/raw/7579500f1d821d08a88dbd3d83131f4b0ca4066e/gistfile1.txt — kRazzy R, Feb 27 '18 at 15:24

score 3 · Answer 1 · answered Feb 27 '18 at 15:46

3

The reason it doesn't add up is because you have two occasions which should add a negative amount of letters because the value is lower than the previous number:

(69, 'not-MIT')
(68, 'not-MIT')

(76, 'not-MIT')
(71, 'not-MIT')

For future reference: it's better not to call your variables 'str' as 'str()' already is a defined function in python.

answered Feb 27 '18 at 15:46

Nathan

3,558
1
18
38

wow! very keen observation! The data is the culprit then! I was pulling my hairs for an entire day unable to figure out what was wrong with the logic. – kRazzy R Feb 27 '18 at 16:02
Sure :) If you're satisfied with the answer, can you accept it? – Nathan Feb 27 '18 at 16:06

generate string with length equal to length of time in file, with 1 label per second , python

1 Answers1