0

I have reg exp for match some data (is it here) and now I try to replace all matched data with single : characetr

test_str = u"THERE IS MY DATA"
p = re.compile(ur'[a-z]+([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)[a-z]+', re.M|re.I|re.SE) 
print re.sub(p, r':/1',test_str) 

I try it on few other way but it's not replace any or replace not only matched but whole pattern

MastaBot
  • 273
  • 2
  • 4
  • 16
  • 2
    What does `/1` mean? Remove it if you need to [replace the matched data with just one `:`](https://regex101.com/r/tR3bW9/1). However, when manipulating HTML, you'd be safer using an HTML parser. – Wiktor Stribiżew Jan 28 '16 at 11:03
  • Also notice that regex101 has a substitution section that would have let you spot the error easily. – Cyrbil Jan 28 '16 at 11:05
  • Please see [this answer](http://stackoverflow.com/a/1732454/1250422) and the answers below it for very good reasons why you should **not** be doing this. – Archimaredes Jan 28 '16 at 11:08
  • @WiktorStribiżew `/1` it's mean 'first matched group?' I find it somewhere with google, when I remove it then replace work bu replace not one group in `()` but whole pattern. What HTML parser for python you have in mind? – MastaBot Jan 28 '16 at 11:32
  • 1
    @MastaBot: I worked with BeautifulSoup only, and can recommend it. – Wiktor Stribiżew Jan 28 '16 at 11:33
  • @Archimaredes I try to understand that text I read it twice but I don't see any reason why I should't use regex or my english is too bad, sorry – MastaBot Jan 28 '16 at 11:34
  • @WiktorStribiżew I'll give it try, thanks – MastaBot Jan 28 '16 at 11:38
  • Sorry @MastaBot, that post is mostly a joke - but its point is entirely serious. It is impossible to handle HTML entirely using regular expressions, so use a parser. :) – Archimaredes Jan 28 '16 at 11:38
  • @Archimaredes sitll don't understand why it is impossible? Now i know that probably there is better tools for it than regexp but for sure it is still doable with regex – MastaBot Jan 28 '16 at 11:45
  • Okay, I'll admit it; 'impossible' is maybe too far. [This](http://stackoverflow.com/q/6751105/1250422) is a good read. – Archimaredes Jan 28 '16 at 11:48

1 Answers1

0

1)It's backslash issue.
Use : print re.sub(p, r':\1',test_str) not print re.sub(p, r':/1',test_str) .
2)You are replacing all the pattern with :\1, that means replace all the text with : followed by the first group in the regex.
To replace just the first group inside the text you should add two groups , before the first and after. I hope this will fix the issue:

test_str = u"THERE IS MY DATA" 
p = re.compile(ur'([a-z]+)([\n].*?<\/div>[\n ]+<div class="large-3 small-3 columns">[\n ]+)([a-z]+)', re.M|re.I|re.SE) 
print re.sub(p, r'\1:\2\3',test_str)
  • you have right, anyway after I corret this, it replace but whole pattern not only matched part `()` – MastaBot Jan 28 '16 at 11:41