How to extract each word consecutive to its own previous number in a string and sorting the result in Python

Question

Input : x3b4U5i2 Output : bbbbiiUUUUUxxx

How can i solve this problem in Python. I have to print the word next to it's number n times and sort it

Can you have more than 9 repeats? Are the characters to repeat always letters? — mozway, Oct 22 '22 at 05:19

mozway · Answer 1 · 2022-10-22T06:46:52.240

0

One option, extract the character/digit(s) pairs with a regex, sort them by letter (ignoring case), multiply the letter by the number of repeats, join:

s = 'x3b4U5i2'

import re

out = ''.join([c*int(i) for c,i in 
               sorted(re.findall('(\D)(\d+)', s),
                      key=lambda x: x[0].casefold())
              ])

print(out)

Output: bbbbiiUUUUUxxx

If you want to handle multiple characters you can use '(\D+)(\d+)'

edited Oct 22 '22 at 06:46

answered Oct 22 '22 at 05:25

mozway

194,879
13
39
75

@CryptoFool I just noticed that as well reading the question again to proof my answer. It is fixed ;) – mozway Oct 22 '22 at 05:32
You can provide a generator expression to `str.join` rather than a list comprehension. – Chris Oct 22 '22 at 06:19
1

@Chris yes but [it's less efficient](https://stackoverflow.com/questions/37782066/list-vs-generator-comprehension-speed-with-join-function) as `join` requires to know the length of the input – mozway Oct 22 '22 at 06:21

EduGord · Answer 2 · 2022-10-22T06:14:10.470

0

I'm assuming the formatting will always be <char><int> with <int> being in between 1 and 9...

input_ = "x3b4U5i2"

result_list = [input_[i]*int(input_[i+1]) for i in range(0, len(input_), 2)]
result_list.sort(key=str.lower)
result = ''.join(result_list)

There's probably a much more performance-oriented approach to solving this, it's just the first solution that came into my limited mind.

Edit

After the feedback in the comments I've tried to improve performance by sorting it first, but I have actually decreased performance in the following implementaiton:

input_ = "x3b4U5i2"

def sort_first(value):
    return value[0].lower()

tuple_construct = [(input_[i], int(input_[i+1])) for i in range(0, len(input_), 2)]
tuple_construct.sort(key=sort_first)
result = ''.join([tc[0] * tc[1] for tc in tuple_construct])

Execution time for 100,000 iterations on it:

1) The execution time is: 0.353036
2) The execution time is: 0.4361724

edited Oct 22 '22 at 06:14

answered Oct 22 '22 at 05:31

EduGord

139
2
13

1

Since you mention performance, the drawback of your approach is that you generate first the expanded string **then** sort. Sorting being being O(n*logn), this makes it more expensive than sorting before. ;) – mozway Oct 22 '22 at 05:38
1

Also, no need to convert your string to `ord`, python knows how to sort strings :) – mozway Oct 22 '22 at 05:41
For your comparison to be meaningful you need to test **large** inputs. The O(n*logn) complexity has an impact when n is large. For small inputs it's negligible. – mozway Oct 22 '22 at 06:03

CryptoFool · Answer 3 · 2022-10-22T05:51:56.837

0

It wasn't clear if multiple digit counts or groups of letters should be handled. Here's a solution that does all of that:

import re

def main(inp):
    parts = re.split(r"(\d+)", inp)
    parts_map = {parts[i]:int(parts[i+1]) for i in range(0, len(parts)-1, 2)}
    print(''.join([c*parts_map[c] for c in sorted(parts_map.keys(),key=str.lower)]))

main("x3b4U5i2")
main("x3brx4U5i2")
main("x23b4U35i2")

Result:

bbbbiiUUUUUxxx
brxbrxbrxbrxiiUUUUUxxx
bbbbiiUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUxxxxxxxxxxxxxxxxxxxxxxx

edited Oct 22 '22 at 05:51

answered Oct 22 '22 at 05:44

CryptoFool

21,719
5
26
44

This is more or less my approach except that the use of a dictionary intermediate would make it fail on input like `'x3b1x2'` ;) NB. I considered single chars myself but using `'(\D+)(\d+)'` would make it work with multichar as you did. – mozway Oct 22 '22 at 05:58

Chris · Answer 4 · 2022-10-22T17:00:44.677

0

No list comprehensions or generator expressions in sight. Just using re.sub with a lambda to expand the length encoding, then sorting that, and then joing that back into a string.

import re

s = "x3b4U5i2"

''.join(sorted(re.sub(r"(\D+)(\d+)", 
                      lambda m: m.group(1)*int(m.group(2)), 
                      s),
        key=lambda x: x[0].casefold()))
# 'bbbbiiUUUUUxxx'

If we use re.findall to extract a list of pairs of strings and multipliers:

import re

s = 'x3b4U5i2'

pairs = re.findall(r"(\D+)(\d+)", s)

Then we can use some functional style to sort that list before expanding it.

from operator import itemgetter

def compose(f, g): 
  return lambda x: f(g(x))

sorted(pairs, key=compose(str.lower, itemgetter(0)))
# [('b', '4'), ('i', '2'), ('U', '5'), ('x', '3')]

edited Oct 22 '22 at 17:00

answered Oct 22 '22 at 06:27

Chris

26,361
5
21
42

1

Same remark than to @EduGord, if the expanded string is much larger than the original, it's quite less efficient to sort after. Simple example on `x100000a100000`. – mozway Oct 22 '22 at 06:39
1

That's completely fair. Your approach was the first thing to come to my mind, but there'd be no point in posting the same answer twice. – Chris Oct 22 '22 at 06:43

How to extract each word consecutive to its own previous number in a string and sorting the result in Python

4 Answers4

Edit