-1

I'm iterating through pages and I'd like to modify lines containing

<span class="font16"></span>

How can I correct the code below?

text = re.sub(r'<span class="font(.*)"></span><span', r'<span class="font\1">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span><span', text)
Ahmad Alfy
  • 13,107
  • 6
  • 65
  • 99
MarkF6
  • 493
  • 6
  • 21

1 Answers1

1

The pattern .* will match anything until the end of line, so the match will look like this:

16"></span>....

which isn't what you want. Use a pattern that stops at the first " (since they aren't allowed inside attribute values which are quoted with "):

r'<span class="font([^"]+)"></span><span'
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Ok, now, I've got this: text = re.sub(r'{} ; where I'd like to insert the signs between font16"> and . – MarkF6 Sep 10 '13 at 08:24
  • I'm wondering why the span is empty. You should probably search for `` without the closing `` or the next span.. – Aaron Digulla Sep 10 '13 at 11:53