0

I am looking for solution to get a character based string extracted from another string. I need only first 4 "characters only" from another string. The restriction here is that "another" string may contain spaces, special characters, numbers etc and may be less than 4 characters.

For example - I should get

  1. "NAGP" if source string is "Nagpur District"

  2. "ILLF" if source string is "Ill Fated"

  3. "RAJU" if source string is "RA123 *JU23"

  4. "MAC" if source string is "MAC"

Any help is greatly appreciated.

Thanks for sharing your time and wisdom.

IrfanRaza
  • 3,030
  • 17
  • 64
  • 89
  • Have you attempted to solve this yourself in any way yet? – Tanner Oct 22 '14 at 11:07
  • Just use a solution like this to remove non-alpha characters: http://stackoverflow.com/questions/1007697/how-to-strip-all-non-alphabetic-characters-from-string-in-sql-server, and then take the first 4 characters. – Tanner Oct 22 '14 at 11:10

3 Answers3

1
^([a-zA-Z])[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?[^a-zA-Z\n]*([a-zA-Z])?

You can try this.Grab the captures or groups.See demo.

http://regex101.com/r/rQ6mK9/42

vks
  • 67,027
  • 10
  • 91
  • 124
1

You can use the answer in the question and add substring method to get your value of desired length How to strip all non-alphabetic characters from string in SQL Server?

i.e.

Create Function [dbo].[RemoveNonAlphaCharacters](@Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin

    Declare @KeepValues as varchar(50)
    Set @KeepValues = '%[^a-z]%'
    While PatIndex(@KeepValues, @Temp) > 0
        Set @Temp = Stuff(@Temp, PatIndex(@KeepValues, @Temp), 1, '')

    Return @Temp
End

use it like

Select SUBSTRING(dbo.RemoveNonAlphaCharacters('abc1234def5678ghi90jkl'), 1, 4);

Here SUBSTRING is used to get string of length 4 from the returned value.

Community
  • 1
  • 1
Satyajit
  • 2,150
  • 2
  • 15
  • 28
0

A bit late to the party here, but as a general rule I despise all functions with BEGIN .. END, they almost never perform well, and since this covers all scalar functions (until Microsoft implement inline scalar expressions), as such whenever I see one I look for an alternative that offers similar reusability. In this case the query can be converted to an inline table valued function:

CREATE FUNCTION dbo.RemoveNonAlphaCharactersTVF (@String NVARCHAR(1000), @Length INT)
RETURNS TABLE
AS
RETURN
(   WITH E1 (N) AS 
    (   SELECT 1 
        FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) n (N)
    ), 
    E2 (N) AS (SELECT 1 FROM E1 CROSS JOIN E1 AS E2), 
    N (Number) AS (SELECT TOP (LEN(@String)) ROW_NUMBER() OVER(ORDER BY E1.N) FROM E2 CROSS JOIN E1)
    SELECT  Result = (  SELECT  TOP (ISNULL(@Length, 1000)) SUBSTRING(@String, n.Number, 1)
                        FROM    N
                        WHERE   SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'
                        ORDER BY Number
                        FOR XML PATH('')
                    )
);

All this does is use a list of numbers to expand the string out into columns, e.g. RA123 *JU23T becomes:

Letter
------
R
A
1
2
3

*
J
U
2
3
T

The rows that are not alphanumeric are then removed by the where clause:

WHERE   SUBSTRING(@String, n.Number, 1) LIKE '[a-Z]'

Leaving

Letter
------
R
A
J
U
T

The @Length parameter then limits the characters (in your case this would be 4), then the string is rebuilt using XML concatenation. I would usually use FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') for xml concatenation to allow for xml characters, but since I know there are none I haven't bothered as it is additional overhead.

Running some tests on this with a sample table of 1,000,000 rows:

CREATE TABLE dbo.T (String NVARCHAR(1000));
INSERT T (String)
SELECT TOP 1000000 t.String
FROM    (VALUES ('Nagpur District'), ('Ill Fated'), ('RA123 *JU23'), ('MAC')) t (String)
        CROSS JOIN sys.all_objects a
        CROSS JOIN sys.all_objects B
ORDER BY a.object_id;

Then comparing the scalar and the inline udfs (called as follows):

SELECT  COUNT(SUBSTRING(dbo.RemoveNonAlphaCharacters(t.String), 1, 4))
FROM    T;

SELECT  COUNT(tvf.Result)
FROM    T
        CROSS APPLY dbo.RemoveNonAlphaCharactersTVF (t.String, 4) AS tvf;

Over 15 test runs (probably not enough for an accurate figure, but enough to paint the picture) the average execution time for the scalar UDF was 11.824s, and for the inline TVF was 1.658, so approximately 85% faster.

GarethD
  • 68,045
  • 10
  • 83
  • 123