1

I'm designing a SQL schema, where I have a "Contact" table with a column "LanguagesSpoken" that may have multiple values such as 'en,fr'.

I want to query the Contact table to retrieve every contact that speak a specific language:

select * from Contact where LanguagesSpoken like '%en%'

This is however not a satisfying solution to me, performance-wise. Is there a way to improve performance by indexing each individual value of the CSV column?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
uzul
  • 1,096
  • 9
  • 23
  • 5
    Never store data as comma separated items, it will only cause you lots of trouble. – jarlh Jun 07 '16 at 07:11
  • I agree with you, this however seems overkill to me to create a table just to store these. There's nothing else to store but a list of language per contact... – uzul Jun 07 '16 at 07:13
  • 1
    That's OK; good tables are often skinny (few columns, possibly many rows). You can index it to good effect. You can't index a comma-separated string effectively. You've denormalized, which kills some sorts of performance. If you want your performance back, renormalize. Note that you can also ensure that the same language isn't entered twice for a single contact too — trivially, with your unique index. – Jonathan Leffler Jun 07 '16 at 07:22
  • 2
    SQL Server has data types *designed* for holding multiple values - tables and XML. If you're not using either of the types *designed* for holding multiple values, don't be surprised that you're not provided with tools for optimizing when you just stuff everything into a string. – Damien_The_Unbeliever Jun 07 '16 at 07:23
  • 2
    This is a bad idea any way you look at it. [For more information, read the answers to this question.](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) – Zohar Peled Jun 07 '16 at 07:27

3 Answers3

4

First of all - doing this you violate 'First Normal Form' - if and only if the domain of each attribute contains only atomic (indivisible) values

But if you really need this then try apply full-text index and search by whole-word match with CONTAINS (https://technet.microsoft.com/en-us/library/ms187787(v=sql.110).aspx)

Dewfy
  • 23,277
  • 13
  • 73
  • 121
1

In Summary,there is no index which can offer good performance ,if you are storing the data in a comma seperated list.

Even though you use like condition,SQL can accurately guess string statistics if there is an index on the column provided the string is less than 80 characters.if string exceeds 80 characters it will take first 40 and last 40 characters to create statistics on this string column.Other than this ,you are not having any advantages with the way you are querying data using like and storing it..

Demo:

create table 
#test
(
id int,
langspoken varchar(100)
)


insert into #test
select 1,'en,fr,ger,en_us'
union all
select  2,'en,fr'
go 100

create  index nci on #test(langspoken)

select * from #test where langspoken like '%fr%'

now lets see estimates :

enter image description here

Further,if you are not able to store data in a delimited way,accurate way to query data is like below using one of the string functions from here..

create table 
#test
(
id int,
langspoken varchar(100)
)


insert into #test
select 1,'en,fr,ger,en_us'
union all
select  2,'en,fr'


select * from #test t
cross apply
dbo.SplitStrings_Numbers(t.langspoken,',') 
where item='en'
TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
1

Please try the below code. Its working with SQL server 2012.

DECLARE @Table TABLE (ID int, Languages Varchar(10))
INSERT @Table
(ID,Languages)
VALUES
(1, 'en,fr,ge'),
(2, 'en,ky,ge'),
(3, 'hi,fr,ge')


SELECT ID ,Languages FROM
    (
     SELECT 
         A.ID AS ID,  
         Split.a.value('.', 'VARCHAR(100)') AS Languages  
     FROM  (
            SELECT 
                ID,  
                CAST ('<M>' + REPLACE(Languages, ',', '</M><M>') + '</M>' AS XML) AS Languages  
            FROM  
                @Table A
             )  AS A 
    CROSS APPLY Languages.nodes ('/M') AS Split(a)
    ) AS B
WHERE B.Languages ='en'
Praveen ND
  • 540
  • 2
  • 10