How to get all the two characters long substrings separated by dot (.) from an email address in SQL server. I want Scalar function

Question

I have one email column that is having values like this 'claudio.passerini@uni.re.dit.mn.us'. I want to take two characters strings between dot (to check for the countries and states codes).

i want result like this

col1=re,mn,us

What database do you use ?, SQL Server, Oracle, MySQL, ... ? — Marc Guillot, Nov 18 '16 at 07:43
I want to comapare with country code and state code. Thats why i want only two characters between the dot(.). — Krish, Nov 18 '16 at 07:48
Did you try using the [CHARINDEX](https://msdn.microsoft.com/en-us/library/ms186323.aspx) or [STRING_SPLIT](https://msdn.microsoft.com/en-us/library/mt684588.aspx) functions? — Jeroen Heier, Nov 18 '16 at 08:08

Marc Guillot · Answer 1 · 2016-11-18T14:39:03.293

You can create your own function to split strings.

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[fnSplitString] 
( 
    @string NVARCHAR(MAX), 
    @delimiter CHAR(1) 
) 
RETURNS @output TABLE(splitdata NVARCHAR(MAX) 
) 
BEGIN 
    set @delimiter = coalesce(@delimiter, dbo.cSeparador());

    DECLARE @start INT, @end INT 
    SELECT @start = 1, @end = CHARINDEX(@delimiter, @string) 
    WHILE @start < LEN(@string) + 1 BEGIN 
        IF @end = 0  
            SET @end = LEN(@string) + 1

        INSERT INTO @output (splitdata)  
        VALUES(SUBSTRING(@string, @start, @end - @start)) 
        SET @start = @end + 1 
        SET @end = CHARINDEX(@delimiter, @string, @start)

    END 
    RETURN 
END

Using this function you can get all your country&state codes :

select splitdata from dbo.fnSplitString('claudio.passerini@uni.re.dit.mn.us', '.')
where len(splitdata) = 2

You can modify that query to concatenate the result on a single string :

 SELECT
 STUFF((SELECT ',' + splitdata
        FROM dbo.fnSplitString('claudio.passerini@uni.re.dit.mn.us', '.')
        WHERE len(splitdata) = 2
        FOR XML PATH('')), 1, 1, '')

Here is how you put it into an scalar function :

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[fnCountryCodes](@email nvarchar(max)) returns nvarchar(max)
AS
BEGIN
    RETURN (SELECT
            STUFF((SELECT ',' + splitdata
                   FROM dbo.fnSplitString(@email, '.')
                   WHERE len(splitdata) = 2
                   FOR XML PATH('')), 1, 1, ''));
END

You call it like this :

select dbo.fnCountryCodes('claudio.passerini@uni.re.dit.mn.us')

Alternatively you can create a table-valued function that returns all the 2 characters long substrings from the domain of a mail address :

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[fnCountryCodes] (@email NVARCHAR(MAX))   
RETURNS @output TABLE(subdomain1 nvarchar(2), subdomain2 nvarchar(2), subdomain3 nvarchar(2), subdomain4 nvarchar(2), subdomain5 nvarchar(2)) 
as
BEGIN 
  DECLARE @subdomain1 nvarchar(2);
  DECLARE @subdomain2 nvarchar(2);
  DECLARE @subdomain3 nvarchar(2);
  DECLARE @subdomain4 nvarchar(2);
  DECLARE @subdomain5 nvarchar(2);

  DECLARE CURSOR_SUBDOMAINS CURSOR FOR select splitdata from dbo.fnSplitString(@email, '.') where len(splitdata) = 2;
  OPEN CURSOR_SUBDOMAINS;
  FETCH NEXT FROM CURSOR_SUBDOMAINS INTO @subdomain1;
  FETCH NEXT FROM CURSOR_SUBDOMAINS INTO @subdomain2;
  FETCH NEXT FROM CURSOR_SUBDOMAINS INTO @subdomain3;
  FETCH NEXT FROM CURSOR_SUBDOMAINS INTO @subdomain4;
  FETCH NEXT FROM CURSOR_SUBDOMAINS INTO @subdomain5;
  CLOSE CURSOR_SUBDOMAINS;
  DEALLOCATE CURSOR_SUBDOMAINS;

  INSERT INTO @output (subdomain1, subdomain2, subdomain3, subdomain4, subdomain5) 
              values (@subdomain1, @subdomain2, @subdomain3, @subdomain4, @subdomain5)
  RETURN 
END

You use it like that :

select * from dbo.fnCountryCodes('claudio.passerini@uni.re.dit.mn.us')

After from keyword why you're using function name instead of table name? — Krish, Nov 18 '16 at 08:22
This is a Table-Valued function, it returns a dataset (the list of substrings from your string) https://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx — Marc Guillot, Nov 18 '16 at 08:26
That's very easy, now that you have split it you just have to use a cursor to loop through that dataset so you can return it as you prefer. — Marc Guillot, Nov 18 '16 at 08:52
@Krish You want to avoid cursors as much as possible. They almost always perform worse than a proper set based approach. — iamdave, Nov 18 '16 at 10:41
Ok, here you have it. It still is a table-valued function, because an scalar function only returns single values, but now it returns all the strings in a single row. — Marc Guillot, Nov 18 '16 at 10:47
I have updated the solution to return the result on a single string, as demanded on your updated question. Now you have both possibilities: an scalar function and a table-valued function. — Marc Guillot, Nov 18 '16 at 14:20

score 0 · Answer 2 · answered Nov 18 '16 at 10:40

If yo want individual columns you will need to pivot your data after splitting out your strings with a table valued function as per Marc's answer. If you are happy having them in rows, you can just use the select statement inside the brackets.

Query to get the data

declare @t table (Email nvarchar(50));
insert into @t values('claudio.passerini@uni.re.dit.mn.us'),('claudio.passerini@uni.ry.dit.mn.urg'),('claudio.passerini@uni.rn.dit.mn.uk');

select Email
        ,[1]
        ,[2]
        ,[3]
        ,[4]
        ,[5]
        ,[6]
from(
    select t.Email
            ,s.Item
            ,row_number() over (partition by t.Email order by s.Item) as rn
    from @t t
        cross apply dbo.DelimitedSplit8K(t.Email,'.') s
    where len(s.Item) = 2
) a
pivot
(
    max(Item) for rn in([1],[2],[3],[4],[5],[6])
) pvt

Table valued function to split out the strings, courtesy of Jeff Moden

http://www.sqlservercentral.com/articles/Tally+Table/72993/

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l

your'e function is working good.but i want only single row it's splitting 6 rows — Krish, Nov 18 '16 at 12:19
@Krish My script already returns one row per Email, with up to six instances of a 2 character string in separate columns. Do you mean you want the results in one column? If so, you should update your question to reflect this. — iamdave, Nov 18 '16 at 13:20

JohnLBevan · Accepted Answer · 2016-11-18T20:06:38.457

Solution

To do exactly what you've asked; i.e. pull back just the 2 char codes from within the email address's domain, you could use a function such as this:

create function dbo.fn_Get2AlphaCharCodesFromEmail
(
    @email nvarchar(254) --max length of an email is 254: http://stackoverflow.com/questions/386294/what-is-the-maximum-length-of-a-valid-email-address
) returns nvarchar(254)
as
begin
    declare @result nvarchar(254) = null
    , @maxLen int = 254 
    ;with cte(i, remainder,result) as 
    (
        select cast(0 as int)
        , cast('.' + substring(@email,charindex('@',@email)+1,@maxLen) + '.' as nvarchar(254))
        , cast(null as nvarchar(254))
        union all
        select cast(i+1 as int)
        , cast(substring(remainder,patindex('%.[A-Z][A-Z].%',remainder)+3,@maxLen)as nvarchar(254))
        , cast(coalesce(result + ',','') + substring(remainder,patindex('%.[A-Z][A-Z].%',remainder)+1,2) as nvarchar(254))
        from cte
        where patindex('%.[A-Z][A-Z].%',remainder) > 0
    )
    select top 1 @result = result from cte order by i desc;
    Return @result;
end
go

--demo
select dbo.fn_Get2AlphaCharCodesFromEmail ('claudio.passerini@uni.re.dit.mn.us') 
--returns: re,mn,us
select dbo.fn_Get2AlphaCharCodesFromEmail ('claudio.passerini@uni.123.dit.mnx.usx')
--returns: NULL

Explanation

Create a function called fn_Get2AlphaCharCodesFromEmail in the schema dbo which takes a single parameter, @email which is a string of up to 254 characters, and returns a string of up to 254 characters.
```
create function dbo.fn_Get2AlphaCharCodesFromEmail
(
    @email nvarchar(254)
) returns nvarchar(254)
as
begin
    --... code that does the work goes here
end
```
declare the variables we'll be using later on.
- @result holds the value we'll be returning from the function
- @maxLen records the maximum length of an email; this makes it slightly easier should this length ever need to change; though not entirely simple since we have to specify the 254 length in our column & variable definitions later on anyway.
  
  declare @result nvarchar(254) = null , @maxLen int = 254
Now comes the interesting bit. We create a common table expression with 3 columns:
- i is used to record which iteration each record was produced in; the highest value of i is the last record to be created.
- remainder is used to hold the yet-to-be processed characters from the email.
- result is used to record the 2 char codes; each new row adds another value to this column's comma separated values.
  
  ;with cte(i, remainder,result) as ( --code to iterate through the email string, breaking it down, goes here )
this gives us our first row in the cte "table".
- The cast statements throughout this part are to ensure we have a consistent data type, as data types in a CTE are implicit, and not always correct
- we initialise i (i.e. the first column) with value 0 to say that this is our first row (we could choose pretty much any value here; it doesn't matter
- we initialise remainder (i.e. 2nd column) as the part of the email address which follows the @ character; i.e. the email's domain.
- we initialise result (i.e. 3rd column) as null; as we've not yet found a result (i.e. a 2 char string within the email's domain)
- there is no from component as we're just getting data from the @email variable; no tables/views/etc are required.
```
select cast(0 as int)
, cast('.' + substring(@email,charindex('@',@email)+1,@maxLen) + '.' as nvarchar(254))
, cast(null as nvarchar(254))
```
union all is used to combing the first result(s) with the results of the next (recurring) statement. NB: The CTE code before this statement is run once to give initial values; the code after is run once for each new set of rows generated.
```
    union all
```
The recurring code in the CTE is applied to new rows in the CTE until no new rows are generated.
- i takes the value of the previous iteration's row's i incremented by 1.
```
select cast(i+1 as int)
```
- remainder takes the previous iteration's remainder, and removes everything before (and including) the next 2 character code (result).
- patindex('%.[A-Z][A-Z].%',remainder) returns a number giving the location of the a string containing a dot followed by 2 letters followed by a dot, occurring anywhere in the input string
```
, cast(substring(remainder,patindex('%.[A-Z][A-Z].%',remainder)+3,@maxLen)as nvarchar(254))
```
- result uses the same logic as remainder, only it takes the 2 characters found, rather than everything after them. These characters are added on to the end of the previous iteartion's row's result value, separated by a comma.
```
, cast(coalesce(result + ',','') + substring(remainder,patindex('%.[A-Z][A-Z].%',remainder)+1,2) as nvarchar(254))
```
- the from cte part just says that we're referencing the same "table" we're creating; i.e. this is how the recursion occurs
```
from cte
```
the where statement is used to prevent infinite recursion; i.e. once there are no more 2 char codes left in the remainder, stop looking.
```
    where patindex('%.[A-Z][A-Z].%',remainder) > 0
```
Once we've found all the 2 char codes in the string, we know that the last row's result will contain the complete set; as such we assign this single row's value to the @result variable.
```
select top 1 @result = result 
```
the from statement shows we're referencing the data we created in our with cte statement
```
from cte 
```
the order by is used to determine which record comes first (i.e. which record is the top 1 record). We want it to be the last row generated by the CTE. Since we've been incrementing i by 1 each time, this last record will have the highest value of i, so by sorting by i desc (descending) that last generated row will be the row we get.
```
order by i desc;
```
Finally, we return the result generated above.
```
Return @result;
```

Alternative Approach

However, if you're trying to extract information from your emails, I'd recommend an alternate approach... have a list of values that you're looking for, and compare your email with that, without having to break apart the email address (beyond splitting on the @ to ensure you're only checking the email's domain).

declare @countryCodes table (code nchar(2), name nvarchar(64)) --you'd use a real table for this; I'm just using a table variable so this demo's throwaway code
insert into @countryCodes (code, name) 
values 
     ('es','Spain')
    ,('fr','France')
    ,('uk','United Kingdom') 
    ,('us','USA') 
    --etc.

--check a single mail
declare @mail nvarchar(256) = 'claudio.passerini@uni.re.dit.mn.us'
if exists (select top 1 1 from @countryCodes where  '.' + substring(@mail,charindex('@',@mail)+1,256) + '.' like '%.' + code + '.%')
begin
    select name from @countryCodes where  '.' + substring(@mail,charindex('@',@mail)+1,256) + '.' like '%.' + code + '.%'
end
else
begin
    select 'no results found'
end
--check a bunch of mails
declare @emailsToCheck table (email nvarchar(256))
insert into @emailsToCheck (email)
values
('claudio.passerini@uni.re.dit.mn.us')
,('someone@someplace.co.uk')
,('cant.see.me@never.never.land')
,('some.fr.address.hidden@france.not.in.this.bit')

select e.email, c.name
from @emailsToCheck e
left outer join @countryCodes c
on '.' + substring(email,charindex('@',email)+1,256) + '.' like '%.' + code + '.%'
order by e.email, c.name

I want scalar function..I have one email column that is having values like this 'claudio.passerini@uni.re.dit.mn.us'. I want to take two characters strings between dot (to check for the countries and states codes). i want result like this col1=re,mn,us — Krish, Nov 18 '16 at 14:11
that function should work like this..select functionname('claudio.passerini@uni.re.dit.mn.us')..reslut col1=re,mn,us — Krish, Nov 18 '16 at 14:13
No worries, I've created a function which gives you what you're after... however I recommend you consider the alternative approach as that would be more flexible/useful in most of the use cases I can think of... — JohnLBevan, Nov 18 '16 at 19:15
Whatever I tried your function is working as VERY GOOD.Thank you for given your time. — Krish, Nov 19 '16 at 07:52

How to get all the two characters long substrings separated by dot (.) from an email address in SQL server. I want Scalar function

3 Answers3

Query to get the data

Table valued function to split out the strings, courtesy of Jeff Moden