2

Table1

ID        Name              Tags 
----------------------------------
1        Customer1         Tag1,Tag5,Tag4 
2        Customer2         Tag2,Tag6,Tag4,Tag11 
3        Customer5         Tag6,Tag5,Tag10 

and Table2

ID     Name             Tags 
----------------------------------
1    Product1     Tag1,Tag10,Tag6 
2    Product2     Tag2,Tag1,Tag5 
3    Product5   Tag1,Tag2,Tag3 

what is the best way to join Table1 and Table2 with Tags column?

It should look at the tags column which coma seperated on table 2 for each coma seperated tag on the tags column in the table 1

Note: Tables are not full-text indexed.

enter image description here

Justin
  • 9,634
  • 6
  • 35
  • 47
  • Your question make me think of an HTML solution (jquery) but the tag `sql` makes me think otherwise – Ahmad Jul 02 '15 at 09:28
  • CVS in column is frequently used but is very bad practice. It is good to re-design the DB. However, at least you can surround first and last values with commas (",Tag1,Tag2,Tag3,") to be able to scan with `Tags LIKE '%,Tag1,%'. Otherwise you may scan `LIKE '%Tag1%'` and find "Tag19". – i486 Jul 02 '15 at 09:47
  • 1
    You might be able to use the solutions from http://stackoverflow.com/q/5493510/121309 to create intermediary tables and query on those. Don't expect any performance though. – Hans Kesting Jul 02 '15 at 09:51

4 Answers4

2

The best way is not to have comma separated values in a column. Just use normalized data and you won't have trouble with querying like this - each column is supposed to only have one value.

Without this, there's no way to use any indices, really. Even a full-text index behaves quite different from what you might thing, and they are inherently clunky to use - they're designed for searching for text, not meaningful data. In the end, you will not get much better than something like

where (Col like 'txt,%' or Col like '%,txt' or Col like '%,txt,%')

Using a xml column might be another alternative, though it's still quite a bit silly. It would allow you to treat the values as a collection at least, though.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • thank you yor answer. What are you think about my solution... step 1 : split Table1 with "," comma and make a vertical table. For example #dt with distinct tags=> Tag1 Tag2 Tag3 Step 2 : Join Table2 with #dt –  Jul 02 '15 at 09:51
  • @memettayanc About the easiest and most proper way to do this - instead of storing the tags in a single varchar column, just use a table. Fixing it up while you're *querying* the data will pretty much guarantee poor performance - there's no way to use indexing, really. If you just do this with a temporary table while running the query, you'll get even worse than the `like` example above. – Luaan Jul 02 '15 at 09:59
1

I don't think there will ever be an easy and efficient solution to this. As Luaan pointed out, it is a very bad idea to store data like this : you lose most of the power of SQL when you squeeze what should be individual units of data into a single cell.

But you can manage this at the slight cost of creating two user-defined functions. First, use this brilliant recursive technique to split the strings into individual rows based on your delimiter :

CREATE FUNCTION dbo.TestSplit (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn AS SplitIndex,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS SplitPart
    FROM Pieces
  )

Then, make a function that takes two strings and counts the matches :

CREATE FUNCTION dbo.MatchTags (@a varchar(512), @b varchar(512))
RETURNS INT
AS
BEGIN
RETURN 
(SELECT COUNT(*)
FROM dbo.TestSplit(',', @a) a 
INNER JOIN dbo.TestSplit(',', @b) b 
    ON a.SplitPart = b.SplitPart)
END

And that's it, here is a test roll with table variables :

DECLARE @A TABLE (Name VARCHAR(20), Tags VARCHAR(100))
DECLARE @B TABLE (Name VARCHAR(20), Tags VARCHAR(100))

INSERT INTO @A ( Name, Tags )
VALUES 
( 'Customer1','Tag1,Tag5,Tag4'),
( 'Customer2','Tag2,Tag6,Tag4,Tag11'),
( 'Customer5','Tag6,Tag5,Tag10')

INSERT INTO @B ( Name, Tags )
VALUES 
( 'Product1','Tag1,Tag10,Tag6'),
( 'Product2','Tag2,Tag1,Tag5'),
( 'Product5','Tag1,Tag2,Tag3')

SELECT * FROM @A a
INNER JOIN @B b ON dbo.MatchTags(a.Tags, b.Tags) > 0
Community
  • 1
  • 1
ttzn
  • 2,543
  • 22
  • 26
  • thank you for answer, your script is work fine but too slow. Table rowcount= 2541 Table2 rowcount=885 select ==> 132 second http://i57.tinypic.com/inxzqt.png –  Jul 02 '15 at 14:18
1

I developed a solution as follows:

CREATE TABLE [dbo].[Table1](
Id      int not null,
Name    nvarchar(250) not null,
Tag     nvarchar(250)  null,
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[Table2](
Id      int not null,
Name    nvarchar(250) not null,
Tag     nvarchar(250)  null,
) ON [PRIMARY]
GO

get sample data for Table1, it will insert 28000 records

INSERT INTO Table1
SELECT CustomerID,CompanyName, (FirstName + ',' + LastName)
FROM AdventureWorks.SalesLT.Customer 
GO 3    

sample data for Table2.. i need same tags for Table2

declare @tag1 nvarchar(50) = 'Donna,Carreras'
declare @tag2 nvarchar(50) = 'Johnny,Caprio'

get sample data for Table2, it will insert 9735 records

 INSERT INTO Table2
    SELECT ProductID,Name, (case when(right(ProductID,1)>=5) then  @tag1 else @tag2 end)
    FROM AdventureWorks.SalesLT.Product 
    GO 3 

My Solution

create TABLE #dt (
        Id int IDENTITY(1,1) PRIMARY KEY,
        Tag nvarchar(250) NOT NULL
    );

I've create temp table and i will fill with Distinct Tag-s in Table1

insert into #dt(Tag)
SELECT distinct Tag
FROM      Table1

Now i need to vertical table for tags

  create TABLE #Tags (  Tag nvarchar(250) NOT NULL );

Now i'am fill #Tags table with While, you can use Cursor but while is faster

declare @Rows int = 1
declare @Tag nvarchar(1024)
declare @Id int = 0

WHILE @Rows>0
BEGIN
    Select Top 1 @Tag=Tag,@Id=Id from #dt where Id>@Id
    set @Rows =@@RowCount
    if @Rows>0
    begin
        insert into #Tags(Tag)   SELECT Data FROM dbo.StringToTable(@Tag, ',')
    end
END

last step : join Table2 with #Tags

select distinct t.*
from Table2 t  
    inner join  #Tags on (',' + t.Tag + ',') like ('%,' + #Tags.Tag + ',%')

Table rowcount= 28000 Table2 rowcount=9735 select is less than 2 second

0

I use this kind of solution with paths of trees. First put a comma at the very begin and at the very end of the string. Than you can call

Where col1 like '%,' || col2 || ',%'

Some database index the column also for the like(postgres do it partially), therefore is also efficient. I don't know sqlserver.

jurhas
  • 613
  • 1
  • 4
  • 12
  • MS SQL uses `+` for string coalescing. And it does use indices for `like`, but *only* when it's `something%` - that's the way the index is ordered. I'm pretty sure postgres does something similar. Oh, and of course, it may choose to *scan* the index - I'm talking about seeks. – Luaan Jul 02 '15 at 10:00