4

I am trying to replace certain parts of the string below.

'''<td align="center"> 5 </td> <td> align="center"> 0.0001 </td>'''

I need to remove the <td> tag if there is a '0.'(decmial occurrence). i.e. the output should be

'''<td align="center"> 5 </td>'''

I have tried this

data = ' '.join(data.split())<br>
l = data.replace('<td align="center"> 0.r"\d" </td>', "")

but didn't succeed. Could anyone please help me with doing this.

Thanks in advance

Francis Gilbert
  • 3,382
  • 2
  • 22
  • 27
funnyguy
  • 513
  • 3
  • 6
  • 15
  • 1
    Why do some users not accept answers ? Actually, why are there *ever* questions asked which are then not accepted? Surely there can't be tht many people who, after asking a question, completely lost access to the Internet forever? – Cris Stringfellow Feb 28 '12 at 10:14
  • [Obligatory reading](http://stackoverflow.com/a/1732454/566644) – Lauritz V. Thaulow Feb 28 '12 at 12:31

3 Answers3

11

While both of the regular expression examples work, I would advice against using regexp.

Especially if the data is a full html document, you should go for html-aware parser, such as lxml.html e.g.:

from lxml import html
t = html.fromstring(text)
tds = t.xpath("table/tbody/tr[2]/td")
for td in tds:
    if tds.text.startswith("0."):
        td.getparent().remove(td)
text = html.tostring(t)
Kimvais
  • 38,306
  • 16
  • 108
  • 142
2

I would do it with regular expression:

import re
s = "<td align='center'> 5 </td><td align='center'>0.00001</td>"
re.sub("<td align='center'>0.\d+</td>", "", s)
zeroos
  • 2,094
  • 3
  • 17
  • 24
2

You could use a regular expression to check for the <td> and if it matches, you can use re.sub() to replace it with what ever you want.

pattern = '\"<td align=\"center\"> 0.[0-9]+ </td>\"'
p = re.compile(pattern)
p.sub('', my_string)

where my_string contains the string you want to operate on, hope this helps

Darth Plagueis
  • 910
  • 3
  • 21
  • 39