1

I'm trying to replace strings marked in both quotation mark styles (“...” and "...") on a string in Python.

I've already written a regex to replace the standard quotations

print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

When I try to do it for the literary (?) ones it doesn't replace anything.

return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)

In fact, as I have it right now, I can't even make a conditional query:

quote_list = ['“', '”']

if all(character in self.title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
print re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

EDIT: Further context: It's an object

class Entry(models.Model):
    title = models.CharField(max_length=200)

def render_title(self):
    """
    This function wraps italics around quotation marks
    """
    quote_list = ['“', '”']

    if all(character in self.title for character in quote_list):
        print "It has literary quotes"
        return re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', self.title)
    return re.sub(r'\"(.+?)\"', r'<em>"\1"</em>', self.title)

I am not well-versed in regex commands. What am I doing wrong?

EDIT2: One step closer to the problem! It lies with the fact that I'm dealing with unicoded strings. I'm still stumped as how I can solve this. Any help is appreciated!

>>> title = u"sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs “ asd” asd
>>> title = "sdsfgsdfgsdgfsdgs “ asd” asd"
>>> print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
sdsfgsdfgsdgfsdgs <em>“ asd”</em> asd
Mærcos
  • 188
  • 14

2 Answers2

0
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
quote_list = ['“', '”']
title = "“...”"

if all(character in title for character in quote_list):
    print "It has literary quotes"
    print re.sub(r'\“(.+?)\”', r'<em>“\1”</em>', title)
  1. Please check if you encoding supports the characters that you are using. I am here using utf-8 which supports quotes that you have used, and everything worked well.
  2. your if condition might not be true at all, check if the condition can every be true. all return true when every element is Truthy

Ensure where ever when you compare or use regexpression the coding format is same. support using a unicode regexp pattern against a unicode string

quote_list = [u'“', u'”']
title = u"“...”"

if all(character in title for character in quote_list):
   print "It has literary quotes"
   print re.sub(u'\“(.+?)\”', u'<em>“\1”</em>', title)
saikumarm
  • 1,565
  • 1
  • 15
  • 30
0

I finally found an answer. After printing the variable as suggested by @interjay I found out that the string was unicoded.

Comparing it with a simple string didn't work so I removed the conditional and used this answer to simply make an unicode-escaped regex string to handle both simple and "literary" quotes.

title = re.sub(ur'\“(.+?)\”', ur'“<em>\1</em>”', self.title)  # notice the ur
title = re.sub(ur'\"(.+?)\"', ur'"<em>\1</em>"', title)

I've seen here in a comment (unfortunately now deleted) how one could merge the above two sentences in one, but for now it works.

Thank you very much for your help!

Community
  • 1
  • 1
Mærcos
  • 188
  • 14