I've got the string:
<u>40 -04-11</u>
How do I remove the spaces and hyphens so it returns 400411?
Currently I've got this:
(<u[^>]*>)(\-\s)(<\/u>)
But I can't figure out why it isn't working. Any insight would be appreciated.
Thanks
I've got the string:
<u>40 -04-11</u>
How do I remove the spaces and hyphens so it returns 400411?
Currently I've got this:
(<u[^>]*>)(\-\s)(<\/u>)
But I can't figure out why it isn't working. Any insight would be appreciated.
Thanks
(<u[^>]*>)(\-\s)(<\/u>)
Your pattern above doesn't tell your regex where to expect numbers.
(<u[^>]*>)(?:-|\s|(\d+))*(<\/u>)
That should get you started, but not being a python guy, I can't give you the exact replacement syntax. Just be aware that the digits are in a repeating capture group.
Edit: This is an edit in response to your comment. Like I said, not a python guy, but this will probably do what you need if you hold your tongue just right.
def repl(matchobj):
if matchobj.group(1) is None:
return ''
else:
return matchobj.group(1)
source = '<u>40 -04-11</u>40 -04-11<u>40 -04-11</u>40 -04-11'
print re.sub(r'(?:\-|\s|(\d+))(?=[^><]*?<\/u>)', repl, source)
Results in:
>>>'<u>400411</u>40 -04-11<u>400411</u>40 -04-11'
If the above offends the Python deities, I promise to sacrifice the next PHP developer I come across. :)
You don't really need a regex, you could use :
>>> '<u>40 -04-11</u>'.replace('-','').replace(' ','')
'<u>400411</u>'
Using Perl syntax:
s{
(<u[^>]*>) (.*?) (</u>)
}{
my ($start, $body, $end) = ($1, $2, $3);
$body =~ s/[-\s]//g;
$start . $body . $end
}xesg;
Or if Python doesn't have an equivalent to /e,
my $out = '';
while (
$in =~ m{
\G (.*?)
(?: (<u[^>]*>) (.*?) (</u>) | \z )
}sg
) {
my ($pre, $start, $body, $end) = ($1, $2, $3, $4);
$out .= $pre;
if (defined($start)) {
$body =~ s/[-\s]//g;
$out .= $start . $body . $end;
}
}
I'm admittedly not very good at regexes, but the way I would do this is by:
<u>...</u>
pairre.sub
on the bit between the match using group()
.That looks like this:
example_str = "<u> 76-6-76s</u> 34243vvfv"
tmp = re.search("(<u[^>]*>)(.*?)(<\/u>)",example_str).group(2)
clean_str = re.sub("(\D)","",tmp)
>>>'76676'
You should expose correctly your problem. I firstly didn't exactly understand it.
Having read your comment (only between the tags <u> and </u> tags)
, I can now propose:
import re
ss = '87- 453- kol<u>40 -04-11</u> maa78-55 98 12'
print re.sub('(?<=<u>).+?(?=</u>)',
lambda mat: ''.join(c for c in mat.group() if c not in ' -'),
ss)
result
87- 453- kol<u>400411</u> maa78-55 98 12