2

I am a beginner in python an am currently struggling with something:

I want to make a couple of changes in a single string. Is it possible to use a single asterisk sign (*), as a replacement-joker for a couple of characters? For example I have a string:

string1 = "The new year is about to become an old year"

And I want to use this pattern for finding:

find:
*year*year*

replace it with:
*century*one*

Which will result in:

string1 = "The new century is about to become an old one"

Meaning "*" character will replace all those characters between, and before the "year" and "year" words. Is that possible?

thoni56
  • 3,145
  • 3
  • 31
  • 49
marco
  • 899
  • 5
  • 13
  • 21

4 Answers4

5

It will be worth your while to look into regular expressions. In your case, the main things you need to know are that . matches any single character, .* matches zero or more of any character, that parentheses are used for grouping, and backslash followed by a number form a backreference (of an existing group).

So, to match year, followed by arbitrary stuff, followed by year again, use year.*year.

Now, to substitute, use the grouping and backreference:

import re
string2 = re.sub('year(.*)year', r'century\1one', string1)

Effective use of regular expressions is definitely not obvious to most beginners. For some suggestions on gentler introductions, see this question:

https://stackoverflow.com/questions/2717856/any-good-and-gentle-python-regexp-tutorials-out-there

The above question has been deleted, and many of the links there are dead anyway. A few from there that remain valid as of this writing:

And, of course, Googling should turn up plenty of resources.

John Y
  • 14,123
  • 2
  • 48
  • 72
  • Thank you John. The Python docs are terrible material for beginners. Your code works perfectly. But what if I change `find` and `replace`: string1 = "The new year is about to become an old year in 4 days" find = "*year*year*4*" replace = "century*one*10*" In what way do I need to change your code in order for this new case to work? – marco Dec 27 '13 at 13:31
  • I would argue the Python docs as a whole are at least as beginner-friendly as most documentation out there. The tutorial in particular is solidly above average, though of course it's going to be even better suited to programmers from other languages than newcomers to programming. The issue you're having is that regular expressions are **especially** (and intrinsically!) geared toward experienced programmers. @dmvianna already included some links for further reading; I'll add some more to my answer. – John Y Dec 27 '13 at 14:21
  • 2
    I like @JohnY's answer better than mine, but to add a numeral literal after a numeric backreference, you need to treat it as if it were an arbitrary one (as in my first answer). So your your code would be `re.sub('year(.*)year(.*)4', r'century\1one\g<2>10', string1)`. See [this post](http://stackoverflow.com/questions/5984633/python-re-sub-group-number-after-number). – dmvianna Dec 29 '13 at 00:09
  • Thank you. What does the "\g<2>" represent? – marco Dec 29 '13 at 01:27
  • It's in the link @dmvianna provided when he said "See this post." Did you not find it clear enough, or did you just not bother to read it? We're all trying to help you, but you have to show *some* kind of effort of your own. – John Y Dec 29 '13 at 07:23
  • Just as `\1` represents 'group 1', `\g<2>` represent 'group 2', but in a more explicit way that doesn't trip the interpreter when the group name is followed by numerals. By the way, did you notice that the first argument we're using to match the text now resembles a lot your 'asterisk character' string? All the debate is now on how to correctly reference the replacement string (the second argument). Also notice we use `r"string"` -- the leading r tells Python to read the string literally (raw), without interpreting backlashes as escapes. – dmvianna Dec 29 '13 at 11:09
  • @dmvianna Thank you for this large explanation. I guess some of things that I still do not understand are a result of my lack of knowledge. I am just trying to understand the pattern - so would it be rude if I add one more word for find("some") and replacement("any")? For example, is this the correct way or did I get it all wrong: `re.sub('year(.*)year(.*)4(.*)some', r'century\1one\g<2>10\g<3>any', string1)` – marco Dec 29 '13 at 11:21
  • 1
    Correct. Works for me with `string = 'the new year is about to become an old year in 4 days or sometime.'` You will gain knowledge by working through the tutorials, and coming here only when you're stuck and can't find the answer on Google. Also, it really helped me to follow [@RegexTip](https://twitter.com/RegexTip) on Twitter. – dmvianna Dec 29 '13 at 21:44
  • I am very grateful on the help you gave me @dmvianna. Will follow the sources you gave me. Thank you John Y too! What a wonderful community stackoverflow is. – marco Dec 30 '13 at 00:24
4

You don't need asterisks. Just use

import re
string1 = "The new year is about to become an old year"
new_string = re.sub(r"(?P<y>year)(.*)(?P=y)", r"century\2one", string1)

Or more concisely:

new_string = re.sub(r"(year)(.*)\1", r"century\2one", string1)

One pass, using regular expressions. Explanation: each parentheses of the first argument defines one capturing group. The first is named "y" (with ?P) and matches the literal year; the second matches any number(*) of any character (.); the third matches the named group "y" defined by the first group (in our case, "year"). The second argument replaces the first matched group with century, and the third group with one. Notice that in Python, we start counting from zero.

Kudos to @JonhY for the pointers in the comments below, and also m.buettner. My heros!

It seems to me you haven't heard of regular expressions (or regex) yet. Regex is a very powerful mini language that is used to match text. Python has a very good implementation of regex. Have a look at:

Tutorial at Regex One

Python Documentation on Regex

Community
  • 1
  • 1
dmvianna
  • 15,088
  • 18
  • 77
  • 106
  • Thank you for the reply dmvianna. But that did not exactly do what I expect. Your code returned this: "The new century is about to become an old century" While I am looking for this: "The new century is about to become an old one" – marco Dec 26 '13 at 21:33
  • Your "solution" differs from what the OP is asking for, and doesn't help. It also doesn't even illustrate the usefulness of regex functionality. If all the OP wanted to do was replace `year` with `century`, they could simply do `string1.replace('year', 'century')`. – John Y Dec 26 '13 at 21:34
  • @JohnY Noted and fixed. ;7) – dmvianna Dec 26 '13 at 21:42
  • Still not getting the right sentence. The initial string1 is: string1 = "The new year is about to become an old year" – marco Dec 26 '13 at 21:48
  • Fixed. Made it very simple, too. ;7) – dmvianna Dec 26 '13 at 21:57
  • You should definitely be able to do it in one pass. That stuff between the first and second instances of `year` can be preserved as a group, and that group can be used in the replacement text. – John Y Dec 26 '13 at 21:59
  • @JohnY Sure I can match the text in one pass, but how would I replace it with two different words? I'm keen to know. – dmvianna Dec 26 '13 at 22:05
  • @dmvianna: Read the documentation (standard library reference, not the how-to) on `re.sub` and note the mention of *backreferences*. You're not replacing one word with two different words; you're replacing a start token and an end token (which *just happen* to be the same), and preserving the stuff in the middle. – John Y Dec 26 '13 at 22:17
  • @JohnY, you are my hero. :D – dmvianna Dec 26 '13 at 23:02
  • 1
    +1 But I'm not convinced the named captures are worth the trouble for such a simple case. This seems much easier to grok: `r'(year)(.*)\1'`. – FMc Dec 26 '13 at 23:31
  • 1
    @FMc +1, true. This is just me being excited with a brand new skill. :) Also, it may be overkill, but I find the \1 less intuitive for a new learner than making sure every group is inside parentheses. And showing that you _could_ give any arbitrary name for a group. – dmvianna Dec 26 '13 at 23:34
  • Thank you dmvianna. I am trying to understand some pattern here. What if I change string1, find and replace: string1 = "The new year is about to become an old year in 4 days" find = "year*year*4" replace = "century*one*10*" How will your code look now? – marco Dec 27 '13 at 13:40
  • @dmvianna: I am glad you found my comments enlightening, though I personally was thinking of something a little different (see my answer). – John Y Dec 27 '13 at 14:30
  • @dmvianna Hi. Is there a way, your code could be changed to suit the upper change: string1 = "The new year is about to become an old year in 4 days" find = "year*year*4" replace = "century*one*10*". Thank you. – marco Dec 28 '13 at 10:14
1
string1 = "The new year is about to become an old year"
find = '*year*year*'
replace = '*century*one*'

for  f,r in zip(find.strip('*').split('*'), replace.strip('*').split('*')):
    string1 = string1.replace(f, r, 1)

Output:

The new century is about to become an old one
Omid Raha
  • 9,862
  • 1
  • 60
  • 64
0

This is a sample implementation that does not do any error checking.

>>> def custom_replace(s, find_s, replace_s):
...     terms = find_s.split('*')[1:-1]
...     replacements = replace_s.split('*')[1:-1]
...     for term, replacement in zip(terms, replacements):
...       s = s.replace(term, replacement, 1)
...     return s
... 
>>> string1 = "The new year is about to become an old year"
>>> print custom_replace(string1, "*year*year*", "*century*one*")
The new century is about to become an old one
>>> 
yan
  • 20,644
  • 3
  • 38
  • 48
  • Thank you for the reply Yan. But this solution does not work when * is removed from the start of the "find" and "replace" patterns. Fore example: string1 = "year is about to become an old year" print custom_replace(string1, "year*year*", "century*one*") Will result in: "one is about to become an old year" Which is not correct. – marco Dec 26 '13 at 21:41
  • 1
    Well your example had asterisks on the ends. If you want to match without them, remove the `[1:-1]` from lines 2 and 3. – yan Dec 26 '13 at 21:59