Group string by similar substrings

Question

I'm trying to take a number, convert it into binary, and then produce a list of the length of like terms.

For example, n=49 is represented in binary as "110001". I would like to return a list of the length of ["11", "000", "1"], which is [2, 3, 1].

So far I'm able to convert a number into binary using "{0:b}".format(n) but I cannot group the like terms.

Does this answer your question? [Count consecutive characters](https://stackoverflow.com/questions/34443946/count-consecutive-characters) — mkrieger1, May 30 '22 at 13:52
`from itertools import groupby; result = [sum(1 for _ in g) for _,g in groupby("{0:b}".format(n))]` — Stef, May 30 '22 at 14:02

score 1 · Answer 1 · answered May 30 '22 at 13:54

1

You might use itertools.groupby for this task as follows

import itertools
t = "110001"
lengths = [len(list(g)) for k,g in itertools.groupby(t)]
print(lengths)

output

[2, 3, 1]

itertools.groupby does found runs of consecutive same elements (unless 2nd argument is given), g here is iterator which is used to create list which length is measured. Observe that you can not do len(g).

answered May 30 '22 at 13:54

Daweo

31,313
3
12
25

1

It would be more `complete` to include this `n = bin(n)[2:]` before doing the lengths processing. ;-) As it's more aligned with OP's original ask. – Daniel Hao May 30 '22 at 14:28

mrCopiCat · Answer 2 · 2022-05-30T14:58:45.387

0

You ca use this after converting your number (careful, it should be in string):

from itertools import groupby

bin_number = "110001"
result = ["".join(group) for ele, group in groupby(bin_number)]
print(result)

output:

["11", "000", "1"]

edited May 30 '22 at 14:58

answered May 30 '22 at 13:55

mrCopiCat

899
3
15

ah sorry! , add this import : `from itertools import groupby` – mrCopiCat May 30 '22 at 14:58

965311532 · Answer 3 · 2022-05-30T14:16:56.123

You can do it like this:

from itertools import groupby

def binary_chunks(n: int) -> list:
    return [len("".join(g)) for _, g in groupby(f"{n:b}")]

For example:

>>> binary_chunks(49) # binary: 110001
>>> [2, 3, 1]

>>> binary_chunks(4) # binary: 100
>>> [1, 2]

>>> binary_chunks(123456789) # binary: 111010110111100110100010101
>>> [3, 1, 1, 1, 2, 1, 4, 2, 2, 1, 1, 3, 1, 1, 1, 1, 1]

I also ran a timeit test if you're interested:

>>> %timeit binary_chunks(49)
>>> 2.28 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

rici · Answer 4 · 2022-05-30T18:07:18.500

You can also solve this problem with the re package:

import re
def bit_run_lengths(n, pat=re.compile('1+|0+')):
  return [
           len(m)
           for m in pat.findall(bin(n)[2:])
         ]

According to my benchmarking, that's about twice as fast as the itertools.groupby solution.

For a more general (single-character) run-length converter, you can use a back-reference. (This is the rare case in which back references are safe to use.) It's slightly slower, but it still beats groupby by a bit:

def run_length(s, pat=re.compile(r'((.)\2*)')):
  return [len(m[0]) for m in pat.findall(s)]

>>> run_length("aabbbbbbbzxxaaa")
[('a', 2), ('b', 7), ('z', 1), ('x', 2), ('a', 3)]

Group string by similar substrings

4 Answers4