How to filter triple and double simple quotes in python?

Question

I'm trying to clean a text to keep at most letters, numbers and most usual ponctuation marks. For example, I have sometimes '''words''' or ''words'' so I want to strip those multiple simple quotes. So far I've chosen to use two regex :

import re
tqre=re.compile('\'\'\'[^\']*\'\'\'') #for triple quotes
dqre=re.compile('\'\'[^\']*\'\'') #for "double" quotes

Then strip each match :

res1=tqre.sub(self.quoteExtract,text)
res2=dqre.sub(self.quoteExtract,res1)

where:

def quoteExtract(self,match):
    return match.group().strip("'")

It looks like it works well for triple quote, but I've got many double simple quotes passing through, seems they are not caught. Is it because they are not really simple quotes but another lookalike signs ? Is there another way to handle them ?

Ex : In * ''Esquisse d'une grammaire comparée de l'arménien classique'', 1903. the regex is not found.

Would you ever have a input like `'' hello '' world '' foo ''`? — MooingRawr, Oct 21 '16 at 16:11
it isn't catching it because you are only matching non ' characters inside the quote, however there is one in `d'une` etc. — Tadhg McDonald-Jensen, Oct 21 '16 at 16:12
Maybe I'm missing something, but wouldn't your RE be simpler if you enclosed it in double quotes? Like this: `"'''[^']*'''"`? — cdarke, Oct 21 '16 at 16:12

score 3 · Accepted Answer · edited May 23 '17 at 12:19

3

It doesn't pass because there is a ' (l'arménien) between the double-quotes, but you are trying to match [^']*.

This kind of regex is best expressed using the lazy quantifier:

tqre = re.compile("'''.*?'''")
dqre = re.compile("''.*?''")

Here .*? means match anything string, and when there are multiple matches, choose the shortest one.

. = any character except new-line,
* = zero or more,
? after the star = non-greedy match

edited May 23 '17 at 12:19

Community

1
1

answered Oct 21 '16 at 16:12

kennytm

510,854
105
1,084
1,005

Also known as a *minimal match*, which can be applied to any quantifier. – cdarke Oct 21 '16 at 16:15
Great, that solved exactly my problem ! Is there a way to tag the question as solved ? – KimAndGumi Oct 24 '16 at 09:14

How to filter triple and double simple quotes in python?

1 Answers1