Find and replace both quotation styles in Python unicoded string

Question

I'm trying to replace strings marked in both quotation mark styles (“...” and "...") on a string in Python.

I've already written a regex to replace the standard quotations

print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

When I try to do it for the literary (?) ones it doesn't replace anything.

return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)

In fact, as I have it right now, I can't even make a conditional query:

quote_list = ['“', '”']

if all(character in self.title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

EDIT: Further context: It's an object

class Entry(models.Model):
    title = models.CharField(max_length=200)

def render_title(self):
    """
    This function wraps italics around quotation marks
    """
    quote_list = ['“', '”']

    if all(character in self.title for character in quote_list):
        print "It has literary quotes"
        return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
    return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

I am not well-versed in regex commands. What am I doing wrong?

EDIT2: One step closer to the problem! It lies with the fact that I'm dealing with unicoded strings. I'm still stumped as how I can solve this. Any help is appreciated!

>>> title = u"sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs “ asd” asd
>>> title = "sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs <em>“ asd”</em> asd

Does `print self.title` show you the quotes correctly? Please show an example of the exact string you're running this on. — interjay, Nov 18 '15 at 14:30
Perhaps the left and right quotation marks are encoded as entities such as `“` and `”` — Mariano, Nov 18 '15 at 14:34
@interjay: See edit. I think I found the problem. It works if I use a normal string but not unicode e.g.: u"It has “literary” quotes" — Mærcos, Nov 18 '15 at 14:51

saikumarm · Answer 1 · 2015-11-18T15:20:38.607

#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
quote_list = ['“', '”']
title = "“...”"

if all(character in title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)

Please check if you encoding supports the characters that you are using. I am here using utf-8 which supports quotes that you have used, and everything worked well.
your if condition might not be true at all, check if the condition can every be true. all return true when every element is Truthy

Ensure where ever when you compare or use regexpression the coding format is same. support using a unicode regexp pattern against a unicode string

quote_list = [u'“', u'”']
title = u"“...”"

if all(character in title for character in quote_list):
   print "It has literary quotes"
   print re.sub(u'\“(.+?)\”', u'<em>“\1”</em>', title)

The characters are supported. What I found was that it replaces strings well, but not unicoded strings. Using your example: u"“...”" — Mærcos, Nov 18 '15 at 14:53

score 0 · Accepted Answer · edited May 23 '17 at 12:22

I finally found an answer. After printing the variable as suggested by @interjay I found out that the string was unicoded.

Comparing it with a simple string didn't work so I removed the conditional and used this answer to simply make an unicode-escaped regex string to handle both simple and "literary" quotes.

title = re.sub(ur'\“(.+?)\”', ur'“<em>\1</em>”', self.title)  # notice the ur
title = re.sub(ur'\"(.+?)\"', ur'"<em>\1</em>"', title)

I've seen here in a comment (unfortunately now deleted) how one could merge the above two sentences in one, but for now it works.

Thank you very much for your help!

Find and replace both quotation styles in Python unicoded string

2 Answers2

Linked