7

I have a field in a redshift column that looks like the following:

abcd1234df-TEXT_I-WANT

the characters and numbers in the first 10 digits can be either letters or numbers.

If I use a capture group regex, I would use a poorly written expression like (\w\w\w\w\w\w\w\w\w\w\W)(.*) and grap the 2nd group

But I'm having trouble implementing this in redshift, so not sure how I can grab only the stuff after the first hyphen

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
user1874064
  • 263
  • 1
  • 3
  • 10

5 Answers5

11

As mentioned before, regex might be an overkill. However, it could be useful in some cases.

Here's a basic replace pattern:

SELECT
    regexp_replace(
      'abcd1234df-TEXT_I-WANT'  -- use your input column here instead
    , '^[a-z0-9]{10}-(.*)$'     -- matches whole string, captures "TEXT_I-WANT" in $1
    , '$1'                      -- inserts $1 to return TEXT_I-WANT
    )
;
wp78de
  • 18,207
  • 7
  • 43
  • 71
4

@wp78de gives a very good advice to use REGEX_REPLACE. I allows you to choose the capture group. Using your regex, it would look like that, although you don't need 2 groups in here and using one is sufficient here.

select 
  regexp_replace(
    'abcd1234df-TEXT_I-WANT',
    '(\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W)(.*)', 
    '$2' -- replacement selecting 2nd capture group
  );

Another oprion, although less flexible is using REGEX_SUBSTR with e parameter set (Extract a substring using a subexpression). It allows you to select a substring, but only of a first capture group in your regex. You also have to set the position and occurence parameters to default 1:

Using REGEX you suggested, but only with 1 group:

select 
  regexp_substr(
    'abcd1234df-TEXT_I-WANT',
    '\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W(.*)', 
    1, -- position
    1, -- occurrence
    'e' -- parameters
  );
botchniaque
  • 4,698
  • 3
  • 35
  • 63
3

Regular expressions might be overkill. Basic string operations are good enough:

select substring(col from position('-' in col) + 1)
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

This can be done with charindex and substring.

substring(col,charindex('-',col)+1)
Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
0

You can get the desired string by using the common table expression with regexp_substr as below :

with cte as
(
 select 'abcd1234df-TEXT_I-WANT' as str   
)    
select regexp_substr(str,'-.*') 
       as derived_str
  from cte;

derived_str
-------------
-TEXT_I-WANT  
Barbaros Özhan
  • 59,113
  • 10
  • 31
  • 55