Teradata regular expressions, 0 or 1 spaces

Question

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:

SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)

Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.

I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:

'other stuff 1000 SF'
'other stuff 1000SF'

I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be

'([0-9])+ SF'

When I try the pattern for the second, I try

'([0-9])+SF'

and I get the error

SELECT Failed.  [2662] SUBSTR: string subscript out of bounds

I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.

Try `'[0-9]+SF'`. There is no point putting the character class into a group and quantify the group instead of the character class atom. — Wiktor Stribiżew, Aug 18 '17 at 19:00
@WiktorStribiżew I still get the same error as if I include the (). There may be something else going on causing the error. — GeoffDS, Aug 18 '17 at 19:02
If you can't use `?`, you're going to have to just match one pattern or the other with something like `[0-9]+SF|[0-9]+ SF`. — CAustin, Aug 18 '17 at 19:03
Ok, try this one - `'[0-9][0-9]* ?SF'` to match both cases. Or `'[0-9][0-9]*( |)SF'` to avoid issues with `?`. — Wiktor Stribiżew, Aug 18 '17 at 19:03
@WiktorStribiżew In Teradata, ?Name represents a parameter, so ? is not allowed as far as I can tell. — GeoffDS, Aug 18 '17 at 19:05
@WiktorStribiżew That still gives me the 2662 error which I don't understand. — GeoffDS, Aug 18 '17 at 19:08
[This error occurs when a SUBSTR is used, and either the start position used is beyond the end of the string, the start offset plus the specified length is greater than the string length, or the specified length is negative.](http://teradataerror.com/2662-SUBSTR-string-subscript-out-of-bounds.html) — Wiktor Stribiżew, Aug 18 '17 at 19:13
@WiktorStribiżew: Your link is to a page which simply copies error messages 1:1 from the Teradata manuals and adds some ads :-) — dnoeth, Aug 18 '17 at 21:04
I think the error must have been a substr function I had in the same query. But, confusingly, it only caused an error when I would change the regexp_instr pattern to have no space. This happened several times, trying the pattern without a space caused an error, with a space only, no error. — GeoffDS, Aug 18 '17 at 22:18

linden2015 · Answer 1 · 2017-08-18T19:25:23.430

2

I would pattern it like this:

\b(\d+)\s*[Ss][Ff]\b

\b    # word boundary
(\d+) # 1 or more digits (captured)
\s*   # 0 or more white-space characters
[Ss]  # character class
[Ff]  # character class
\b    # word boundary

Demo

edited Aug 18 '17 at 19:25

answered Aug 18 '17 at 19:18

linden2015

887
7
9

Thanks so much for this answer. It has been helpful as well. – GeoffDS Aug 19 '17 at 12:43

score 2 · Accepted Answer · answered Aug 18 '17 at 19:30

2

The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.

Try this:

RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')

Find multiple digits followed by a single optional blank followed by SF and extract those digits.

answered Aug 18 '17 at 19:30

dnoeth

59,503
4
39
56

What is it that makes the REGEXP_SUBSTR function return only the number here? I'm looking at the documentation online and it doesn't have examples like this where it finds a pattern and grabs a portion of the pattern. This is extremely powerful. – GeoffDS Aug 18 '17 at 19:49
That's exactly what `regexp_substr` does - grab a portion of string based on the regex. – Andrew Aug 18 '17 at 20:41
The `(?=....)` is a so-called look-ahead, i.e. check for the pattern without adding it to the result. Btw, Teradata supports "Perl Compatible Regular Expressions" (PCRE) syntax, there's not much docu, but lots of online resources. – dnoeth Aug 18 '17 at 20:42
Is there a look behind? Because, I now want to do something a bit different where I get the end part of a pattern. I have already tried using the (?= ) around the earlier part of the pattern and it doesn't work. – GeoffDS Aug 18 '17 at 22:15
@Graphth: `(?<=...)` is a look-behind – dnoeth Aug 18 '17 at 22:32
This is exactly what I was looking for, extremely helpful. I also appreciate the extra bit of info on PCRE so I can look up more later. I have found your answers on here and on Teradata forums helpful many times. In fact, I started with these REGEXP functions in the last couple weeks when I found your answer on compressing multiple spaces into one space. – GeoffDS Aug 19 '17 at 12:43
@Graphth: I'm still a beginner in regular expressions :-) – dnoeth Aug 19 '17 at 13:03
@dnoeth You're not as much of a beginner as me, that's for sure. FYI, I asked another question about the look-behind because I'm now getting an error when I try that. – GeoffDS Aug 22 '17 at 13:02

Teradata regular expressions, 0 or 1 spaces

2 Answers2