1

I have a column with following values in a postgres table.

col1
uniprotkb:P62158(protein(MI:0326), 9606 - Homo sapiens)
uniprotkb:O00602-PRO_0000009136(protein(MI:0326), 9606 - Homo sapiens)

I would like to extract a value from above column values.

col2
P62158
O00602

I am using following regexp match on my column

select 

        uniprotkb:(.*)\-|\([a-zA-Z].* as col2

from table;

But the above regexp capture the text before the last '-'. I want to capture the text between uniprotkb: and before the first occurence of either '(' or '-'. Any suggestion here would be helpful.

rshar
  • 1,381
  • 10
  • 28

1 Answers1

1

You may use

uniprotkb:(.*?)[-(][a-zA-Z]
           ^^^ ^^^^

See the regex demo.

Details

  • uniprotkb: - a literal string
  • (.*?) - Group 1: any 0+ chars as few as possible
  • [-(] - a - or (
  • [a-zA-Z] - a letter.

PostgresSQL test:

SELECT (REGEXP_MATCHES (
      'uniprotkb:P62158(protein(MI:0326), 9606 - Homo sapiens)',
      'uniprotkb:(.*?)[-(][a-zA-Z]'
   ))[1]

Outputs:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563