A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END
, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:
CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (@String NVARCHAR(1000), @Length INT)
RETURNS TABLE
AS
RETURN
( WITH E1 (N) AS
( SELECT 1
FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
),
E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2),
N (Number) AS (SELECT TOP (LEN(@String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
SELECT Result = ( SELECT TOP (ISNULL(@Length, 1000)) SUBSTRING(@String, n.Number, 1)
FROM N
WHERE SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'
ORDER BY Number
FOR XML PATH('')
)
);
All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T
becomes:
Letter
------
R
A
1
2
3
*
J
U
2
3
T
The rows that are not alphanumeric are then removed by the where clause:
WHERE SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'
Leaving
Letter
------
R
A
J
U
T
The @Length
parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.
Running some tests on this with a sample table of 1,000,000 rows:
CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
CROSS JOIN sys.all_objects a
CROSS JOIN sys.all_objects B
ORDER BY a.object_id;
Then comparing the scalar and the inline udfs (called as follows):
SELECT COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM T;
SELECT COUNT(tvf.Result)
FROM T
CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;
Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.