0

I want to split text in my html using <br> tags. If the text is longer than 50 characters, I want to replace last space before 10 characters by <br>.

The text is in <span class="value">TEXT</span>

For example <span class="value">cccc cc cccccc cccc cc c</span>

Will became: <span class="value">cccc cc<br>cccccc<br>cccc cc c</span> so every line can have at most 10 characters.

I've created a regex for this which can probably find such tags but can't figure out how to extract text from matched group and then replace it.

snippet = re.sub(r'<span class="value">(.*)<\/span>', 
                 r'<span class="value">\1<\/span>'.(divide text using <br> tags) 

Do you know how to do that?

Milano
  • 18,048
  • 37
  • 153
  • 353

2 Answers2

0

The replacement argument of re.sub can be a function which takes a "match object" and return the replacement. You this you could do any transformation with the matched string.

def replace_text(m):
    return '<span class="value">' + divide_text(m.group(1)) + '</span>'

re.sub(r'<span class="value">(.*?)</span>', replace_text)

Note using an HTML parsing library gives much better control when the input does not just contain exactly the string <span class="value">, e.g.

import lxml.html

document = lxml.html.fromstring('''<html><body>
<span class="value">aaa</span>
<span class=value>bbb</span>
<span class="value-is-irrelevant">ccc</span>
<span class="value should-match-this-too">ddd</span>
</body></html>''')

# http://stackoverflow.com/q/1604471/
elements = document.xpath("//span[contains(concat(' ', @class, ' '), ' value ')]")
for element in elements:
    element.text = element.text.upper()
    # do your "divide text" here.

print(lxml.html.tostring(document))
# <html><body>
# <span class="value">AAA</span>
# <span class="value">BBB</span>
# <span class="value-is-irrelevant">ccc</span>
# <span class="value should-match-this-too">DDD</span>
# </body></html>
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
0

This will dived the span every 10 characters.

import re
snippet = re.sub(r'<span class="value">(.*?)<\/span>', lambda m: "<br>".join([m.group(1)[i:i+10] for i in range(0, len(m.group(1)), 10)]), """<span class="value">cccc cc cccccc cccc cc c</span>""")
print(snippet)
Neil
  • 14,063
  • 3
  • 30
  • 51