Error applying simple regexp

Question

I had a function with RegExp working perfect:

def preprocess(topic, sample, RegSample): 
    topic = re.sub(RegSample,'?X?', topic, flags=re.I)# "" «» для агента X
    topic = re.sub(sample, '?X?', topic, flags=re.I)# без скобок
    topic = re.sub('[ЗАО]*[АО]О\s?X?', '?X? ', topic, flags=re.I)# ЗАО ОАО ООО и т.д. для X
    topic = re.sub('\?X\?\?X\?', '?X?', topic)# Двойные агенты X
    topic = re.sub('групп[^\s]\s?X?', '?X? ', topic, flags=re.I)# группа агента X

    topic = re.sub('\s[a-zA-Z\s\d]*[\s\.$]', ' ?Y? ', topic) # Английские слова+цифры = Агент Y
    topic = re.sub('[\"\«][^\"\»]*[\"\»]', '?Y?', topic, flags=re.I)# "" «» для агента Y
    topic = re.sub('[ЗАО][ЗАО]О\s?Y?', '?Y?', topic)# ЗАО ОАО ООО  и т.д. для Y
    topic = re.sub('\s[А-Я][^\s]*[\s.$]', ' ?Y? ', topic)# Русские названия/имена заменяем на агента Y
    topic = re.sub('\s[А-Я]\S*', '?Y?', topic)
    topic = re.sub('\s[a-zA-Z][^\s]*', ' ?Y?', topic)
    topic = re.sub('\?Y\?\?Y\?', '?Y?', topic)# Двойные агенты Y

    topic = re.sub('[a-zA-Z\d\.-]*[\d][a-zA-Z\d\.-]*', '?D?', topic)# Английские наименования с цифрами(не компании)
    topic = re.sub('[а-яА-Я\d\.-]*[\d][а-яА-Я\d\.-]*', '?D?', topic)# Российские наименования с цифрами(не компании)
    return topic

But then i needed some more RegExp's:

def final_preprocessing(topic):
    topic = re.sub('?X?', 'лол', topic)# лол - слово, кодирующее компанию агент-которого рассматриваем
    topic = re.sub('?Y?', 'кек', topic)# лол - слово, кодирующее всех остыльных компаний-агентов
    topic = re.sub('?D?', 'd ', topic)# кодирует весь треш в ?D?
    return topic

And got an error: error: nothing to repeat at position 0

According to some existing answers, i.e.: Python regex strange behavior, i had to ensure, that there ARE those patterns in my text. I cheked and can trustly say - they ARE in my text. So whats the problem now?

P.S. Other RegExp's either could return ZERO substrings, but they didnt end with a mistake!

This is not a simple regex at all. – Soviut Nov 13 '16 at 22:11 — Soviut, Nov 13 '16 at 22:11

Bryan Oakley · Accepted Answer · 2016-11-13T22:06:08.627

2

? means "repeat zero or one times". When that is the first character of the regular expression, what do you expect to be repeated zero or one times? That's what "nothing to repeat at position zero" means: at position zero you are asking for something to repeat zero or one times, but there's nothing there to repeat.

You need to escape the question mark if you are looking for a literal question mark:

topic = re.sub('\?X\?', 'лол', topic)

edited Nov 13 '16 at 22:06

answered Nov 13 '16 at 22:05

Bryan Oakley

370,779
53
539
685

To reduce this to a minimal example, the error appears with just `re.compile('?')`. – Alex Hall Nov 13 '16 at 22:05
This is complitly my bad. Its an old work i have returned to, and forgot some part of `regex` can you please mark my question as irrelevant? – Nov 13 '16 at 22:07
@VladislavLadenkov: you have the power to delete your own question. – Bryan Oakley Nov 13 '16 at 22:08
@BryanOakley Thank you for the help! – Nov 13 '16 at 22:10

Error applying simple regexp

1 Answers1