Python: Expanding a string of variables with integers

Question

I'm still new to Python and learning the more basic things in programming. Right now i'm trying to create a function that will dupilicate a set of numbers varies names.

Example:

def expand('d3f4e2')
>dddffffee

I'm not sure how to write the function for this. Basically i understand you want to times the letter variable to the number variable beside it.

Can the numbers be `>9`, or do they have to be a single digit? — abarnert, Sep 24 '14 at 01:10
Umm...I i think it's single digit numbers. I was never told if it had to be double digits so i'm going with single. — ChesneyD, Sep 24 '14 at 01:12
@mshsayem: Why did you delete your answer? It could definitely use some explanation, and it obviously won't work if multi-digit numbers are allowed, but it certainly demonstrated a working answer. — abarnert, Sep 24 '14 at 01:12
I deleted it because I did not consider the multi digit possibility. Undeleted now. — mshsayem, Sep 24 '14 at 01:13

score 3 · Answer 1 · answered Sep 24 '14 at 01:29

The key to any solution is splitting things into pairs of strings to be repeated, and repeat counts, and then iterating those pairs in lock-step.

If you only need single-character strings and single-digit repeat counts, this is just breaking the string up into 2-character pairs, which you can do with mshsayem's answer, or with slicing (s[::2] is the strings, s[1::2] is the counts).

But what if you want to generalize this to multi-letter strings and multi-digit counts?

Well, somehow we need to group the string into runs of digits and non-digits. If we could do that, we could use pairs of those groups in exactly the same way mshsayem's answer uses pairs of characters.

And it turns out that we can do this very easily. There's a nifty function in the standard library called groupby that lets you group anything into runs according to any function. And there's a function isdigit that distinguishes digits and non-digits.

So, this gets us the runs we want:

>>> import itertools
>>> s = 'd13fx4e2'
>>> [''.join(group) for (key, group) in itertools.groupby(s, str.isdigit)]
['d', '13', 'ff', '4', 'e', '2']

Now we zip this up the same way that mshsayem zipped up the characters:

>>> groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
>>> ''.join(c*int(d) for (c, d) in zip(groups, groups))
'dddddddddddddfxfxfxfxee'

So:

def expand(s):
    groups = (''.join(group) for (key, group) in itertools.groupby(s, str.isdigit))
    return ''.join(c*int(d) for (c, d) in zip(groups, groups))

I really like the groupby function. +1. This is very helpful. Thanks! — ssm, Sep 24 '14 at 01:58

score 2 · Answer 2 · edited May 23 '17 at 10:26

2

Naive approach (if the digits are only single, and characters are single too):

>>> def expand(s):
       s = iter(s)
       return "".join(c*int(d) for (c,d) in zip(s,s))

>>> expand("d3s5")
'dddsssss'

Poor explanation:

Terms/functions:

iter() gives you an iterator object.
zip() makes tuples from iterables.
int() parses an integer from string
<expression> for <variable> in <iterable> is list comprehension
<string>.join joins an iterable strings with string

Process:

First we are making an iterator of the given string
zip() is being used to make tuples of character and repeating times. e.g. ('d','3'), ('s','5) (zip() will call the iterable to make the tuples. Note that for each tuple, it will call the same iterable twice—and, because our iterable is an iterator, that means it will advance twice)
now for in will iterate the tuples. using two variables (c,d) will unpack the tuples into those
but d is still an string. int is making it an integer
<string> * integer will repeat the string with integer times
finally join will return the result

Here is a multi-digit, multi-char version:

import re

def expand(s):
    s = re.findall('([^0-9]+)(\d+)',s)
    return "".join(c*int(d) for (c,d) in s)

By the way, using itertools.groupby is better, as shown by abarnert.

edited May 23 '17 at 10:26

Community

1
1

answered Sep 24 '14 at 01:09

mshsayem

17,557
11
61
69

This needs some explanation. A novice isn't going to be able to understand that zipping an iterator with itself gives you an iterator over pairs from the underlying iterable. – abarnert Sep 24 '14 at 01:15
or just ` ... in zip( s[::2], s[1::2] )` – ssm Sep 24 '14 at 01:22
1

One thing: You don't have a list comprehension, you have a generator expression (which is a different kind of comprehension, one that doesn't build a list). Otherwise, pretty nice. – abarnert Sep 24 '14 at 01:30
I don't think this is a very poor explanation, especially since it has the links to let the OP follow up whatever part he doesn't understand. The tutorial sections on [Comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) and [Iterators](https://docs.python.org/3/tutorial/classes.html#iterators) (and the next two sections) might help. Also, [How grouper works](http://stupidpythonideas.blogspot.com/2013/08/how-grouper-works.html) attempts to explain the zipping-iterators thing; I don't know how well it succeeds, but another take on it never hurts. – abarnert Sep 24 '14 at 01:33
@mshsayem Thanks for the explanation. One question for you. Why is the extra step of creating iterable from string necessary? For e.g `zip('abc', 'def')` returns `[('a', 'd'), ('b', 'e'), ('c', 'f')]` but `zip(s,s)` returns `[('d', 'd'), ('3', '3'), ('f', 'f'), ('4', '4'), ('e', 'e'), ('2', '2')]` – linuxfan Sep 24 '14 at 01:42
@linuxfan: Look at what `zip('abcd', 'abcd')` gives you. You get `[('a', 'a'), ('b', 'b'), ('c', 'c')]`. But what you want here is `[('a', 'b'), ('c', 'd')]`. How do you get that? If you create an iterator, and zip it with itself, then the `zip` is advancing the same iterator twice for each tuple, instead of once. (I think I explained it better in that "How grouper works" link above.) – abarnert Sep 24 '14 at 01:48
`zip(s, s)` will return `[('d','d')...]` if `s` is a string (not an iterator object). By making `s` an iterator object we avoid that. now basically, next(s) is called to make the tuples. e.g, first tuple is created by `(next(s), next(s))`. as, `s` is now an iterator, subsequent calls to s (by `next`) will return subsequent items. by the way, see [`next()` here](https://docs.python.org/2/library/functions.html#next). My bad, I cant explain good :( – mshsayem Sep 24 '14 at 01:48
@mshsayem: Be careful with the terminology. Strings are _iterables_; they're just not _iterators_. But yeah, tying it to `next` is probably a good way to help explain things. – abarnert Sep 24 '14 at 01:49

abarnert · Answer 3 · 2014-09-24T01:46:08.957

Let's look at how you could do this manually, using only tools that a novice will understand. It's better to actually learn about zip and iterators and comprehensions and so on, but it may also help to see the clunky and verbose way you write the same thing.

So, let's start with just single characters and single digits:

def expand(s):
    result = ''
    repeated_char_next = True
    for char in s:
        if repeated_char_next:
            char_to_repeat = char
            repeated_char_next = False
        else:
            repeat_count = int(char)
            s += char_to_repeat * repeat_count
            repeated_char_next = True
    return char

This is a very simple state machine. There are two states: either the next character is a character to be repeated, or it's a digit that gives a repeat count. After reading the former, we don't have anything to add yet (we know the character, but not how many times to repeat it), so all we do is switch states. After reading the latter, we now know what to add (since we know both the character and the repeat count), so we do that, and also switch states. That's all there is to it.

Now, to expand it to multi-char repeat strings and multi-digit repeat counts:

def expand(s):
    result = ''
    current_repeat_string = ''
    current_repeat_count = ''
    for char in s:
        if isdigit(char):
            current_repeat_count += char
        else:
            if current_repeat_count:
                # We've just switched from a digit back to a non-digit
                count = int(current_repeat_count)
                result += current_repeat_string * count
                current_repeat_count = ''
                current_repeat_string = ''
            current_repeat_string += char
    return char

The state here is pretty similar—we're either in the middle of reading non-digits, or in the middle of reading digits. But we don't automatically switch states after each character; we only do it when getting a digit after non-digits, or vice-versa. Plus, we have to keep track of all the characters in the current repeat string and in the current repeat count. I've collapsed the state flag into that repeat string, but there's nothing else tricky here.

score 0 · Answer 4 · answered Sep 24 '14 at 03:19

There is more than one way to do this, but assuming that the sequence of characters in your input is always the same, eg: a single character followed by a number, the following would work def expand(input): alphatest = False finalexpanded = "" #Blank string variable to hold final output #first part is used for iterating through range of size i #this solution assumes you have a numeric character coming after your #alphabetic character every time for i in input: if alphatest == True: i = int(i) #converts the string number to an integer for value in range(0,i): #loops through range of size i finalexpanded += alphatemp #adds your alphabetic character to string alphatest = False #Once loop is finished resets your alphatest variable to False i = str(i) #converts i back to string to avoid error from i.isalpha() test if i.isalpha(): #tests i to see if it is an alphabetic character alphatemp = i #sets alphatemp to i for loop above alphatest = True #sets alphatest True for loop above print finalexpanded #prints the final result

Python: Expanding a string of variables with integers

4 Answers4

Linked