1

I have a string which contains multiple Q = and the goal is to add the number of occurance after each Q.

For example, 'Q = 1 t h \n Q = 2 t h \n Q = 3 t h' should be 'Q1 = 1 t h \n Q2 = 2 t h \n Q3 = 3 t h'

Here's my method:

import re

test = 'Q = 1 t h \n Q = 2 t h \n Q = 3 t h'
num = test.count('Q =')
pattern = re.compile('[Q]')

for n in range(num):
    where = [m for m in pattern.finditer(test)]
    test = test[:where[n].start()+1]+f'{n+1}'+test[where[n].start()+1:]

print(test)

Is there any better solution?

zxdawn
  • 825
  • 1
  • 9
  • 19
  • For what it's worth, the `re` standard library module maintains a cache of compiled patterns, so putting `re.compile` inside a loop isn't a meaningful performance hit. It's still good practice to put it outside the loop, however, since that shows a proper understanding of the purpose. – Karl Knechtel Apr 08 '23 at 11:44

3 Answers3

2

Possible approaches:

  1. Use the .sub method of regular expressions to replace each matched occurrence. (I wanted to include a reference link for that, but I couldn't find a decent question that directly asks about using regex for substitution where OP didn't already know about re.sub.) It can accept a function that accept the matched string and returns the replacement string.

    The function we need doesn't actually need the matched string for its logic (because it's a constant and because taking it apart would be harder than just re-creating the parts we need), but needs to give a different result each time. We can create those results with logic that simply interpolates numbers sourced from itertools.count (which produces an unlimited sequence of integers counting up, on demand). Since the function only needs to consume that iterator as a one-off, we can declare it locally, and (carefully) use it as a closure in a lambda.

    import re
    from itertools import count
    from functools import partial
    
    pattern = re.compile('Q =')
    test = 'Q = 1 t h \n Q = 2 t h \n Q = 3 t h'
    c = count(1)
    test = pattern.sub(lambda _: f'Q{next(c)} =', test)
    
  2. Instead of regular expressions, since the searched-for substring is a constant, just .split() the string on that value, generate and collect the replacement strings, interweave the replacements with the parts in between and join the results again.

    from itertools import count
    
    test = 'Q = 1 t h \n Q = 2 t h \n Q = 3 t h'
    
    between = test.split('Q =')
    parts = [None] * (len(between) * 2 - 1)
    parts[::2] = between
    parts[1::2] = [f'Q{i} =' for i in range(1, len(between))]
    test = ''.join(parts)
    
  3. Supposing that we can assume that the string is also split into lines (i.e., there is \n at the end of each "question" part): split the string into those lines, replace the start of each line using similar techniques, and join the lines back together:

    test = 'Q = 1 t h \n Q = 2 t h \n Q = 3 t h'
    lines = test.split('\n')
    test = '\n'.join(
        line.replace('Q =', f'Q{i} =') for i, line in enumerate(lines, 1)
    )
    

    Here, enumerate is a built-in function that replaces the count logic - it automatically matches up values like the ones from itertools.count, with values taken from another sequence (here, the lines of the original text). See also Accessing the index in 'for' loops.

  4. If we can't make that assumption, use regex to split the string instead. Use a lookahead assertion so that the actually matched pattern doesn't include any text, but matches right before each Q =, allowing the string to be split in those places. Then make the same replacements as before. This time, we join with an empty string, because we did not remove newlines from the original data.

    import re
    
    pattern = re.compile('(?=Q =)')
    test = 'Q = 1 t h \n Q = 2 t h \n Q = 3 t h'
    lines = pattern.split(test)
    test = ''.join(
        line.replace('Q =', f'Q{i} =') for i, line in enumerate(lines)
    )
    

    Note that this time, we start the enumeration from 0, because splitting this way will produce an empty string at the beginning of the lines.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • 1
    I'd say just use a lambda: `test = re.sub('Q', lambda _, c=count(1): f'Q{next(c)}', test)` – Kelly Bundy Apr 08 '23 at 14:56
  • 1
    That's not just simpler but I think also cleaner, as it ignores the `Match` object explicitly instead of feeding it to `next` as the `default` value, which is weird. – Kelly Bundy Apr 08 '23 at 15:02
  • @KellyBundy while I was thinking about how to convert the generator to a function, I overlooked that it would receive that parameter at all. Generally I strongly dislike the default-parameter binding style, for reasons that don't fit in the comment section - which is why I put a lot of effort into finding and establishing a canonical for parameter binding that prioritizes `functools.partial`. That said, I agree that binding `count` to a function that uses it is a lot cleaner than making the entire generator and then binding it to `next`. – Karl Knechtel Apr 09 '23 at 13:54
  • @KellyBundy I ended up editing the code and just using a closure instead of any kind of binding to the callback. (This is safe even [in a loop](https://stackoverflow.com/questions/3431676) *because the callback is used immediately* rather than outliving the loop.) – Karl Knechtel Apr 09 '23 at 14:09
  • Yeah, closure is nicer. I mostly did it inline because you had, too. – Kelly Bundy Apr 09 '23 at 14:26
  • 1
    Btw, If you're into assumptions (like your 3.), another option is `test = test.replace('Q', 'Q%i') % (*range(1, num+1),)` or `test = test.replace('Q', 'Q{}').format(*range(1, num+1))`. I just wasn't happy enough with them to post them. – Kelly Bundy Apr 09 '23 at 15:02
  • That's the problem with questions where OP suspects regex will be useful: they're nearly always underspecified. – Karl Knechtel Apr 09 '23 at 15:33
1

I would use a regex approach this way :

out = re.sub(r"Q\s*=\s*(\d+)", r"Q\1 = \1", test)

Or simply use replace (to follow your approach) :

out = "\n".join([l.replace("Q =", f"Q{i+1} =") for i, l in enumerate(test.split("\n"))])

Output :

>>> print(out)

'Q1 = 1 t h \n Q2 = 2 t h \n Q3 = 3 t h'
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • 2
    The regex option shown here is not at all robust. It is relying on the fact that the number to write after the `Q` *just happens* to be the same as the number that comes next after the `=`. It's also assuming that the text after the `=` is a sequence of digits, possibly with whitespace in between. I don't think either of those is supposed to be part of the problem specification, it's just how the example happened to be written. – Karl Knechtel Apr 08 '23 at 11:35
  • Isn't the example supposed to describe/tell about the OP's expected output ? TBH, this one is not fully clear (*at least for me*) from what we read in the question, that's why I added a second approach that matches the OP's. Thanks Karl ;) – Timeless Apr 08 '23 at 11:38
0

If your strings don't contain curly braces, you could use the format function to inject the sequence numbers:

test = "Q = 1 t h \n Q = 2 t h \n Q = 3 t h"

result = test.replace("Q =","Q{} =").format(*range(1,test.count("Q =")+1))

print(result)

Q1 = 1 t h 
 Q2 = 2 t h 
 Q3 = 3 t h

If there could be curly braces, you could use split instead, separating the string using "Q =" as the separator and reassembling it with the injected numbering:

result = "".join(f"Q{i} ="*(i>0)+s for i,s in enumerate(test.split("Q =")))

Or you could just use a loop that replaces the "Q =" one at a time:

for i in range(test.count("Q =")):
    test = test.replace("Q =",f"Q{i+1} =",1)

print(test)
    Q1 = 1 t h 
     Q2 = 2 t h 
     Q3 = 3 t h
Alain T.
  • 40,517
  • 4
  • 31
  • 51