1

I am using this regex patttern in python:

'("CDS)(complement)?(\()?(join)?([\(]?[<]?[0-9]{0,6}[.]{0,2}[>]?[0-9]{0,6}[,]?[\)]{0,2})*(/locus_tag=)(["])([^"]*)(["])'

To find things in a file, like this:

"CDScomplement(join(169314..169361,169451..169552,169635..169690,169833..169937,170056..170125,170277..170518,170640..170841,170968..171090,171191..171263,171387..171508))/locus_tag="MAL1P1.24" 

But there is something strange, because when I use re.finditer and use a for loop and print(matchobject.group(0)), I can see that every matchobject group completely matches matches. But when I use print(matchobject.groups()) or print(matchobject.group(5)) it returns an empty string. What's going on?

hwnd
  • 69,796
  • 4
  • 95
  • 132
RonaldN
  • 129
  • 8

1 Answers1

2

Group 5 is that long group with a * after it. This doesn't capture all the repetitions of the group (see earlier questions as well as info here). Wrap that group in another set of parentheses to capture all the repetitions of the inner group:

>>> rx = re.compile(r'("CDS)(complement)?(\()?(join)?(([\(]?[<]?[0-9]{0,6}[.]{0,2}[>]?[0-9]{0,6}[,]?[\)]{0,2})*)(/locus_tag=)(["])([^"]*)(["])')
>>> [m.groups() for m in rx.finditer(txt)]
[(u'"CDS',
  u'complement',
  u'(',
  u'join',
  u'(169314..169361,169451..169552,169635..169690,169833..169937,170056..170125,170277..170518,170640..170841,170968..171090,171191..171263,171387..171508))',
  u'',
  u'/locus_tag=',
  u'"',
  u'MAL1P1.24',
  u'"')]
Community
  • 1
  • 1
BrenBarn
  • 242,874
  • 37
  • 412
  • 384