0

I have a string:

'PROTEIN\t1\t0\ttr|Q6ZN40|Q6ZN40_HUMAN\t\t-1\t;;;;;'. 

And I need to capture the text between two |. So it would be 'Q6ZN40'.

I tried this code for that:

re.match('|(.+?)|', string).group(0)

but it gives me an empty string: ''.

What is wrong with my code ? I presume that the regex is not correct but I don't understand why..

Maria
  • 37
  • 4

1 Answers1

1

You could use re.search instead because re.match starts at the beginning of the string.

To match the pipe literally you have to escape it \| and the value will be in the first capturing group .group(1)

\|(.+?)\|

Regex demo | Python demo

For example

import re

regex = r"\|(.+?)\|"
string = "PROTEIN\\t1\\t0\\ttr|Q6ZN40|Q6ZN40_HUMAN\\t\\t-1\\t;;;;;"
print(re.search(regex, string).group(1))

Output

Q6ZN40

A more efficient match would be to use a negated character class matching any char except a | to prevent unnecessary backtracking (The pipe does not have to be escaped in the character class)

\|([^|]+)\|
The fourth bird
  • 154,723
  • 16
  • 55
  • 70