python randomizer - get random text between curly braces with double nesting level

Question

hey i need to create simple python randomizer. example input:

{{hey|hello|hi}|{privet|zdravstvuy|kak dela}|{bonjour|salut}}, can {you|u} give me advice?

and output should be:

hello, can you give me advice

i have a script, which can do this but only in one nested level

with open('text.txt', 'r') as text:
    matches = re.findall('([^{}]+)', text.read())
words = []
for match in matches:
    parts = match.split('|')
    if parts[0]:
        words.append(parts[random.randint(0, len(parts)-1)])
message = ''.join(words)

this is not enough for me )

Seems to me that your input follows a grammar that is a bit too complicated for simple regular expressions. I'd say, build a proper lexical analyzer that's invoked by a parser to produce your output. If you're unfamiliar with this concept, I suggest you first read up on the theory :) — Karel Kubat, Jan 21 '15 at 11:28
You are looking for recursive regex matching. See: http://stackoverflow.com/questions/1656859/how-can-a-recursive-regexp-be-implemented-in-python — jean-loup, Jan 21 '15 at 11:29
@KarelKubat oh no i dont need this. i just want to get random text from curly braces which contain another one cutly braces — Alice Polansky, Jan 21 '15 at 11:29
And what will you do with more nesting as in {{a|{c|d}|e}|f}? — Karel Kubat, Jan 21 '15 at 11:31
Since there's only one operator, you don't need extra braces. `{a | {b | c}}` is essentially the same as `{a | b | c}`. — georg, Jan 21 '15 at 11:34
@georg This is probabilistically wrong for the exemple in the post. — jean-loup, Jan 21 '15 at 11:38
@jean-loup: theoretically yes, but I don't know if the difference between probabilities 0.15 and 0.125 is important for the OP. — georg, Jan 21 '15 at 11:42

score 2 · Accepted Answer · answered Jan 21 '15 at 11:43

Python regex does not support nested structures, so you'll have to find some other way to parse the string.

Here's my quick kludge:

def randomize(text):
    start= text.find('{')
    if start==-1: #if there are no curly braces, there's nothing to randomize
        return text

    # parse the choices we have
    end= start
    word_start= start+1
    nesting_level= 0
    choices= [] # list of |-separated values
    while True:
        end+= 1
        try:
            char= text[end]
        except IndexError:
            break # if there's no matching closing brace, we'll pretend there is.
        if char=='{':
            nesting_level+= 1
        elif char=='}':
            if nesting_level==0: # matching closing brace found - stop parsing.
                break
            nesting_level-= 1
        elif char=='|' and nesting_level==0:
            # put all text up to this pipe into the list
            choices.append(text[word_start:end])
            word_start= end+1
    # there's no pipe character after the last choice, so we have to add it to the list now
    choices.append(text[word_start:end])
    # recursively call this function on each choice
    choices= [randomize(t) for t in choices]
    # return the text up to the opening brace, a randomly chosen string, and
    # don't forget to randomize the text after the closing brace 
    return text[:start] + random.choice(choices) + randomize(text[end+1:])

georg · Answer 2 · 2015-01-21T12:04:25.487

As I said above, nesting is essentially useless here, but if you want to keep your current syntax, one way to handle it is to replace braces in a loop until there are no more:

import re, random

msg = '{{hey|hello|hi}|{privet|zdravstvuy|kak dela}|{bonjour|salut}}, can {you|u} give me advice?'


while re.search(r'{.*}', msg):
    msg = re.sub(
        r'{([^{}]*)}', 
        lambda m: random.choice(m.group(1).split('|')), 
        msg)

print msg
# zdravstvuy, can u give me advice?

python randomizer - get random text between curly braces with double nesting level

2 Answers2