0

I use Python 2.7 and Django 1.6.

I use unicode.

I'd like to remove my own tag's contents. My own tag name is <nospeak>.

For example, if input the message below,

INPUT:

foofoo<nospeak>barbar</nospeak>hogehoge

The result I hope is there.

OUTPUT:

foofoohogehoge

*<nospeak>barbar</nospeak> is removed

The important thing is that unicode is also included.

I created my method. It runs fine. But, I used it in Django. It didn't run fine.

Could you tell me the good practice to remove my own tag's contents?

F.I.Y the method I created.

# -*- coding: utf-8 -*-
import re

def __make_speakable_text(text):
    pattern = r"(<nospeak>.*?</nospeak>)"
    matches = re.findall(pattern, text)

    speakable_text = text

    if len(matches) == 0:
        print 'Not match'
    else:
        for match in matches:
            # print match
            speakable_text = speakable_text.replace(match, '')

    return speakable_text
shinriyo
  • 344
  • 3
  • 17

1 Answers1

1

Try with re.sub(ur'<nospeak>.*?</nospeak>', '', text).

To read more on the u and r before the regex pattern, you can check What exactly do "u" and "r" string flags do in Python, and what are raw string literals? post.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563