Python match and replace, what I do wrong?

Question

I have reg exp for match some data (is it here) and now I try to replace all matched data with single : characetr

test_str = u"THERE IS MY DATA"
p = re.compile(ur'[a-z]+([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)[a-z]+', re.M|re.I|re.SE) 
print re.sub(p, r':/1',test_str)

I try it on few other way but it's not replace any or replace not only matched but whole pattern

What does `/1` mean? Remove it if you need to [replace the matched data with just one `:`](https://regex101.com/r/tR3bW9/1). However, when manipulating HTML, you'd be safer using an HTML parser. — Wiktor Stribiżew, Jan 28 '16 at 11:03
Also notice that regex101 has a substitution section that would have let you spot the error easily. — Cyrbil, Jan 28 '16 at 11:05
Please see [this answer](http://stackoverflow.com/a/1732454/1250422) and the answers below it for very good reasons why you should **not** be doing this. — Archimaredes, Jan 28 '16 at 11:08
@WiktorStribiżew `/1` it's mean 'first matched group?' I find it somewhere with google, when I remove it then replace work bu replace not one group in `()` but whole pattern. What HTML parser for python you have in mind? — MastaBot, Jan 28 '16 at 11:32
@MastaBot: I worked with BeautifulSoup only, and can recommend it. — Wiktor Stribiżew, Jan 28 '16 at 11:33
@Archimaredes I try to understand that text I read it twice but I don't see any reason why I should't use regex or my english is too bad, sorry — MastaBot, Jan 28 '16 at 11:34
Sorry @MastaBot, that post is mostly a joke - but its point is entirely serious. It is impossible to handle HTML entirely using regular expressions, so use a parser. :) — Archimaredes, Jan 28 '16 at 11:38
@Archimaredes sitll don't understand why it is impossible? Now i know that probably there is better tools for it than regexp but for sure it is still doable with regex — MastaBot, Jan 28 '16 at 11:45
Okay, I'll admit it; 'impossible' is maybe too far. [This](http://stackoverflow.com/q/6751105/1250422) is a good read. — Archimaredes, Jan 28 '16 at 11:48

Mostafa Wattad · Answer 1 · 2016-01-28T12:17:34.693

0

1)It's backslash issue.
Use : print re.sub(p, r':\1',test_str) not print re.sub(p, r':/1',test_str) .
2)You are replacing all the pattern with :\1, that means replace all the text with : followed by the first group in the regex.
To replace just the first group inside the text you should add two groups , before the first and after. I hope this will fix the issue:

test_str = u"THERE IS MY DATA" 
p = re.compile(ur'([a-z]+)([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)([a-z]+)', re.M|re.I|re.SE) 
print re.sub(p, r'\1:\2\3',test_str)

edited Jan 28 '16 at 12:17

answered Jan 28 '16 at 11:25

Mostafa Wattad

61
5

you have right, anyway after I corret this, it replace but whole pattern not only matched part `()` – MastaBot Jan 28 '16 at 11:41

Python match and replace, what I do wrong?

1 Answers1