1

I want to catch Sistemas Operativos in No aprobó ni está inscripto a Sistemas Operativos (Ord. 1150). The (Ord. 1150) is optional. It can appear, or not.

Mi first try was: No aprobó ni está inscripto a (.*)( \(Ord\. 1150\))? but this returned (u'Sistemas Operativos (Ord. 1150)', None).

So, what's the correct regex?

I'm using re in Python.

Update: I don't need to catch specifically the string 'Sistemas Operativos', that string is just an example. There could be any other string, but the context (No aprobó ni está inscripto a .* (Ord.1150)) will be always the same. See the comment by @DSM here.

sanfilippopablo
  • 1,449
  • 2
  • 15
  • 19
  • 2
    Do you really need a regex? can't you just do `'Sistemas Operativos' in my_string` –  Dec 04 '13 at 20:47
  • 2
    Everyone seems to be thinking that you're just interested in knowing whether the phrase exists, but as I read you, you want "No aprobó ni está inscripto a THIS_IS_IN_ENGLISH (Ord. 1150)" to give "THIS_IS_IN_ENGLISH". Am I right, or am I overreading? – DSM Dec 04 '13 at 20:51
  • @DSM You're right. I didn't express the question very well. – sanfilippopablo Dec 04 '13 at 20:57
  • I've delete my answer now you've updated your question. –  Dec 04 '13 at 21:07

2 Answers2

2

Try

No aprobó ni está inscripto a ([^()]*)( \(Ord\. 1150\))?

Regular expression visualization

Debuggex Demo

Then you just need to do:

import re
myString = "No aprobó ni está inscripto a Sistemas Operativos (Ord. 1150)"
result = re.search('No aprobó ni está inscripto a ([^()]*)( \(Ord\. 1150\))?', myString)
course = result.group(1) # may have a trailing space, so maybe strip()
arturomp
  • 28,790
  • 10
  • 43
  • 72
0

The .* is greedy, meaning that it will consume as many characters as possible. Use .*? to make it lazy, thus allowing the subsequent (Ord. 1150) to match, and add a $ to the end to make sure that the entirety of the text is read (in a sense, to counteract the effects of .*?).

>>> string = u'No aprobó ni está inscripto a Sistemas Operativos (Ord. 1150)'
>>> re.match(ur'No aprobó ni está inscripto a (.*?)( \(Ord\. 1150\))?$',
        string).groups()
(u'Sistemas Operativos', u' (Ord. 1150)')
Community
  • 1
  • 1
voithos
  • 68,482
  • 12
  • 101
  • 116