Get everything after a string pattern and before a ' ' in Databricks SQL

Question

I got the following entry in my database with column name - properties_desc:

#Thu Sep 03 02:18:11 UTC 2020 cardType=MasterCard cardDebit=true cardUniqueNumber=f0b03da93bc70fbc194a5a4ef5879685

I want to trim the entry so I get: MasterCard

So basically, I want everything after 'cardType=' and before ''.

I tried referring this Get everything after and before certain character in SQL Server but this works for a special character and not a string.

My try:

SUBSTRING(properties_desc, length(SUBSTRING(properties_desc, 0, length(properties_desc) - CHARINDEX ('cardType=', properties_desc))) + 1, 
    length(properties_desc) - length(SUBSTRING(properties_desc, 0, length(properties_desc) - CHARINDEX ('cardType=', properties_desc))) - length(SUBSTRING(
    properties_desc, CHARINDEX (' ', properties_desc), length(properties_desc))))

But the above query does not work. Any help is appreciated. How can I solve it?

score 1 · Accepted Answer · answered Jan 09 '22 at 04:38

1

You have tagged this question as both sql-server and databricks. Based on your use of length() instead of len(), I assume that you are using databricks. In that case, you can make use of the regexp_extract() function

Try: "regexp_extract(properties_desc, '(?<=cardType=)[^ ]*')".

This is untested, as I am not a databricks programmer.

The "[^ ]*" in the above will match and extract a string of non-space characters after "cardType=". The "(?<=...)" is a "look-behind" construct that requires that the matched text be preceded by "cardType=", but does not include that text in the result. The end result is that the regex matches and extracts everything after "cardtype=" up to the next space (or the end of the string).

Regular expressions are a pretty powerful string matching tool. Well worth learning if you are not already familiar with them. (I wish SQL Server had them.)

answered Jan 09 '22 at 04:38

T N

4,322
1
5
18

Good eye. I zoomed right over len() vs length() +1 – John Cappelletti Jan 09 '22 at 04:40
@JohnCappelletti thanks for your previous query, I did change the len() to length() ! I believe the data_type might be an issue as its is string and not varchar. – Rupesh Ghosh Jan 09 '22 at 04:49
Thanks @T N let me take a look at reg_ex – Rupesh Ghosh Jan 09 '22 at 04:50
The query - regexp_extract(properties_desc, '(?<=cardType=)[^ ]*',0) works but gives me result until the end of the string, is there a way to restrict it until first space? – Rupesh Ghosh Jan 09 '22 at 05:07
The [^ ] (with a space before the closing bracket) should not match past the space. Is it possible that your space is not a plain ASCII space? A tab, non-breaking space, or some encoding that is not noticeable when you look at it? Another possibility is that databricks regex engine has their ignore whitespace option set by default. You can try replacing the space with \x20 or \s. \x20 is the hex encoding for space. \s matches any whitespace character. – T N Jan 10 '22 at 19:29

Get everything after a string pattern and before a ' ' in Databricks SQL

1 Answers1