0

Given a string like '0100011' and a length = 2 we have to count the frequency of pairs in the string: 00,01,10,11.

00 -> 2 , 01 -> 2 , 10 ->1 , 11->1

The problem is if I just use the count('00') function, it doesn't work since it only finds one occurrence: " _ _ 000 _ _".

I've been wondering what the right approach to this kind of problem would be, since for the moment i've tried to create some sort of substring checker with 2 pointer(index and length) which goes back and forth to be sure it doesn't miss any combination but it doesn't feel right

Mark
  • 90,562
  • 7
  • 108
  • 148
  • [String count with overlapping occurrences](https://stackoverflow.com/q/2970520/674039) – wim Jun 23 '22 at 02:01

2 Answers2

1

If you want to find the number of occurrences of a single binary substring, you can use string slicing to get each consecutive pair of letters:

data = "0100011"
target = "00"

sum(data[start:start+len(target)] == target for start in range(len(data) - len(target)))

This outputs:

2

If you want to find the number of occurrences of all binary strings of a particular length, you could use a collections.Counter instead:

Counter(data[start:start+len(target)] for start in range(len(data) - len(target)))

This outputs:

Counter({'01': 2, '00': 2, '10': 1})

(Note that if you try to look up the frequency of a binary string that doesn't occur, the Counter will simply return 0.)

BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
1

here is how I did it:

length = 2
s = '0100011'

counter = {}

for i in range(2**length):
    b = f'{i:0{length}b}'
    for j in range(len(s)-1):
        if b == s[j:j+length]:
            counter[b] = counter.get(b, 0) + 1

print(counter)

will produce:

{'00': 2, '01': 2, '10': 1, '11': 1}

for length == 3:

{'000': 1, '001': 1, '010': 1, '011': 1, '100': 1}

...

for length == 7, obviously:

{'0100011': 1}
jabbson
  • 4,390
  • 1
  • 13
  • 23
  • Can you please clarify " b = f'{i:0{length}b}' " . I get it generate all the binary combination but I don't really get the syntax you're using. I always saw colon for slicing inside a [ ] but what is that and how is so efficient ahah, sorry but i'm not getting it. – kindastoned Jul 12 '22 at 23:03
  • @kindastoned basically this is one string interpolation formatter inside another. With the length equal to 2, the f-string of `f'{i:0{length}b}'` will become `f'{i:02b}'`, which formats/outputs the `i` in its binary representation, zero-padded to 2 character. See the format specification language [here](https://docs.python.org/3/library/string.html#format-specification-mini-language) – jabbson Jul 13 '22 at 02:57
  • that's cool, thanks ^ ^ – kindastoned Jul 14 '22 at 04:27
  • hey, sorry to bother but I might have another question, if the lenght is very long (i start getting problems for lenght > 20) my program takes too much time to check every combination. I was wondering do you have any idea how to improve the checking ? – kindastoned Jul 14 '22 at 08:18
  • I've clarified my question and posted the block of code I've done, I would appreciate if you go check it https://stackoverflow.com/questions/72986509/substring-research-of-a-binarystring-how-to-improve-this-program-in-terms-of-sp – kindastoned Jul 14 '22 at 20:50