1

I am using the following SQL to create a function to strip out characters or numbers.

CREATE FUNCTION [dbo].[fn_StripCharacters]
(
    @String NVARCHAR(MAX), 
    @MatchExpression VARCHAR(255)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
    SET @MatchExpression =  '%['+@MatchExpression+']%'

    WHILE PatIndex(@MatchExpression, @String) > 0
        SET @String = Stuff(@String, PatIndex(@MatchExpression, @String), 1, '')

    RETURN @String

END

I am calling it like this to extract the alpha characters and then the numeric characters into two sort fields:

SELECT
    ...
    (SELECT dbo.fn_StripCharacters(PD.District, '^a-z')) AS Sort1,
    CAST((SELECT dbo.fn_StripCharacters(PD.District, '^0-9')) AS INT) AS Sort2,
    ...
FROM

I am searching a LOT of Postcode District Records and the constant calls to the function are causing a comparatively big delay. Is there a way of reproducing this functionality without a function call? Can it be incorporated into the SELECT statement somehow?

SQL Server Version is 2017

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Coilover
  • 49
  • 5
  • 1
    While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). – Yitzhak Khabinsky Dec 20 '21 at 19:32
  • 2
    Fundamentally SQL Server is very bad at this... so you might get a faster solution not in SQL Server, or in a CLR function. – Dale K Dec 20 '21 at 19:33
  • 2
    Also - "18.9.2" is not a valid **SQL Server** version - the latest is SQL Server 2019 - which is **v15.x.y.z** - you're most likely talking about the version of SQL Server Mgmt Studio - which is just the GUI tool to talk to SQL Server but its version is utterly different from the actual SQL Server version. Run `SELECT @@VERSION` inside SSMS and report back what version of SQL Server you really have .... – marc_s Dec 20 '21 at 20:00
  • You would be better off doing this in an *inline table valued function*. Please provide your correct SQL Server version. Sample data and expected results also work well for SQL questions. – Stu Dec 20 '21 at 20:22
  • Cool use of replace & translate. Using string_split & string_agg in Sql2019 is (surprisingly) still faster by a smidge. – Stu Dec 20 '21 at 21:52
  • `TRANSATE()` and `REPLACE()` is an option. My own experience is summarized [here](https://stackoverflow.com/questions/1007697/how-to-strip-all-non-alphabetic-characters-from-string-in-sql-server/65825787#65825787). – Zhorov Dec 21 '21 at 08:57

2 Answers2

1

You can try something like this:

 Create Function [dbo].[fnCleanString] (
        @inputString varchar(8000)
      , @stringPattern varchar(50) = '[0-9a-zA-Z]'
        )
Returns table
   With schemabinding
     As
 Return

   With t(n)
     As (
 Select t.n 
   From (
 Values (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)
      , (0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) As t(n)
        )
      , iTally (n)
     As (
 Select Top (len(@inputString)) 
        checksum(row_number() over(Order By @@spid)) 
   From t t1, t t2, t t3                                -- 8000 rows
        )
 Select v.inputString
      , outputString = (Select substring(v.inputString, it.n, 1)
                          From iTally                     it
                         Where substring(v.inputString, it.n, 1) Like @stringPattern
                           For xml Path(''), Type).value('.', 'varchar(8000)')
   From (Values (@inputString)) As v(inputString);
GO

And call it like this:

Declare @testData table (AlphaNumeric varchar(100));
 Insert Into @testData (AlphaNumeric)
 Values ('a1b2c3d4'), ('5e6f7g8i');

 Select *
   From @testData                       td
  Cross Apply dbo.fnCleanString(td.AlphaNumeric, '[^A-Z]') cs
  Cross Apply dbo.fnCleanString(td.AlphaNumeric, '[^0-9]') cs2;

This works with VARCHAR - but you can easily change it to return NVARCHAR if that is what you really need. Just be aware that it is written to work with a fixed length string and the maximum size for NVARCHAR is 4000.

You also need to be aware of collation and how that will affect the results.

Jeff
  • 512
  • 2
  • 8
1

How to extract alpha or numbers from a string within a SELECT statement WITHOUT a function call?

You can't, because simple string operations like REPLACE() are also function calls. And, user-defined functions are compiled by SQL Server. They perform pretty well.

But your problem is not the function's performance itself, but rather how often you must use it. You knew that.

Here's a possible way to speed up your postcode-grinding task: put persisted computed columns on your table. You can even index them.

Here's how to do that.

  1. Tell SQL Server to use schemabinding with your stored function. It needs to know that a table definition in your schema depends on the function. To do that add WITH SCHEMABINDING to your function definition.

    ...
    RETURNS NVARCHAR(MAX)
    WITH SCHEMABINDING
    AS
    BEGIN
    ...
    
  2. Add two computed, persisted, columns to your table.

    ...
    ALTER TABLE postcode
           ADD letters   
           AS (dbo.fn_StripCharacters(postcode, '^A-Z'))
           PERSISTED;
     ALTER TABLE dbo.postcode
           ADD numbers
           AS (CAST(dbo.fn_StripCharacters(postcode, '^0-9') AS INT)) 
           PERSISTED;
    
  3. You can put indexes on the computed columns if you need them.

    CREATE INDEX numbers ON postcode  (numbers DESC)
    
  4. Now you can insert, update, or delete your non-computed columns as you wish. SQL Server evaluates your stored functions just once for each row as it is inserted or updated. Your functions still get evaluated, but not when you SELECT from your table.

  5. And you can use the computed columns as you wish

    SELECT * FROM postcode ORDER BY numbers DESC
    

Here's a db<>fiddle demonstrating this.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • I was coming round to the idea of adding columns to the database rather than splitting the District on the Query but this is an extremely interesting and useful answer and has educated me well beyond the initial performance issue I was trying to solve. I would have never thought to use computed values in this way. – Coilover Dec 22 '21 at 12:34