4

I am working on MSSQL, trying to split one string column into multiple columns. The string column has numbers separated by semicolons, like:

190230943204;190234443204;

However, some rows have more numbers than others, so in the database you can have

190230943204;190234443204;
121340944534;340212343204;134530943204

I've seen some solutions for splitting one column into a specific number of columns, but not variable columns. The columns that have less data (2 series of strings separated by commas instead of 3) will have nulls in the third place.

Ideas? Let me know if I must clarify anything.

Justin R.
  • 23,435
  • 23
  • 108
  • 157
user2522217
  • 345
  • 1
  • 7
  • 20
  • 2
    That's bad data design. _Never_ store comma-separated data in columns. Hopefully the purpose here is the fix the schema. – Joel Coehoorn Jul 01 '13 at 22:22
  • You can find what you need here: http://stackoverflow.com/questions/5342629/mysql-split-data-into-multiple-rows – Zzz Jul 01 '13 at 22:26
  • Sorry about that, they are not commas, but semicolons, that separate the data. – user2522217 Jul 01 '13 at 22:26
  • possible duplicate of [Split one column to multiple columns but data will vary SQL](http://stackoverflow.com/questions/13588800/split-one-column-to-multiple-columns-but-data-will-vary-sql) – Ken White Jul 01 '13 at 22:31
  • I've seen that question, it is not a duplicate of this because the person there knows how many columns he will be splitting into (4) while I don't. – user2522217 Jul 01 '13 at 22:33
  • is it always going to be 2 or three columnns, if so just use datalength or perhaps like e.g like %;%;% and go with the simple solution. Feel free to slap whoiever came up with this design in the first place, in fact give them an extra one from me. – Tony Hopkinson Jul 01 '13 at 22:36
  • Haha thanks, I think it would be at most 10 columns, but the data from each row can vary from 1 - 10 strings separated by semicolons. – user2522217 Jul 01 '13 at 22:38
  • Are you trying to fix the data or just read it in an application? If the latter case, what language are you working in? – Justin R. Jul 01 '13 at 22:42
  • I am trying to fix the data so I can analyze it better later. I am working in SQL on the MS SQL Server Management Studio – user2522217 Jul 01 '13 at 22:44
  • If you want to analyse, then you want rows not columns as per RandomSeeds way forward. After that you are looking at some sort of splitting and iteration manouever. – Tony Hopkinson Jul 01 '13 at 23:04

3 Answers3

3

Splitting this data into separate columns is a very good start (coma-separated values are an heresy). However, a "variable number of properties" should typically be modeled as a one-to-many relationship.

CREATE TABLE main_entity (
  id INT PRIMARY KEY,
  other_fields INT
);

CREATE TABLE entity_properties (
  main_entity_id INT PRIMARY KEY,
  property_value INT,
  FOREIGN KEY (main_entity_id) REFERENCES main_entity(id)
);

entity_properties.main_entity_id is a foreign key to main_entity.id.

Congratulations, you are on the right path, this is called normalisation. You are about to reach the First Normal Form.

Beweare, however, these properties should have a sensibly similar nature (ie. all phone numbers, or addresses, etc.). Do not to fall into the dark side (a.k.a. the Entity-Attribute-Value anti-pattern), and be tempted to throw all properties into the same table. If you can identify several types of attributes, store each type in a separate table.

RandomSeed
  • 29,301
  • 6
  • 52
  • 87
1

If these are all fixed length strings (as in the question), then you can do the work fairly simply (at least relative to other solutions):

select substring(col, 1+13*(n-1), 12) as val
from t join
     (select 1 as n union all select union all select 3
     ) n
     on len(t.col) <= 13*n.n

This is a useful hack if all the entries are the same size (not so easy if they are of different sizes). Do, however, think about the data structure because semi-colon (or comma) separated list is not a very good data structure.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • They are indeed fixed-length strings. Can you clarify what "n" and "t.col" are? Also what is "val"? – user2522217 Jul 01 '13 at 23:46
  • 1
    @user2522217 . . . `t` is the name of your table. `col` is the name of the column containing the strings. `n` and `n.n` are names for the subquery and its columns. – Gordon Linoff Jul 02 '13 at 01:26
1

IF I were you, I would create a simple function that is dividing values separated with ';' like this:

IF EXISTS (SELECT * FROM sysobjects WHERE id = object_id(N'fn_Split_List') AND xtype IN (N'FN', N'IF', N'TF'))
BEGIN
    DROP FUNCTION [dbo].[fn_Split_List]
END
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[fn_Split_List](@List NVARCHAR(512))
RETURNS @ResultRowset TABLE ( [Value] NVARCHAR(128) PRIMARY KEY)
AS
BEGIN
    DECLARE @XML xml = N'<r><![CDATA[' + REPLACE(@List, ';', ']]></r><r><![CDATA[') + ']]></r>'

    INSERT INTO @ResultRowset ([Value])
    SELECT DISTINCT RTRIM(LTRIM(Tbl.Col.value('.', 'NVARCHAR(128)')))
    FROM @xml.nodes('//r') Tbl(Col)

    RETURN
END

GO

Than simply called in this way:

SET NOCOUNT ON
GO

    DECLARE @RawData TABLE( [Value] NVARCHAR(256))

    INSERT INTO @RawData ([Value] )
    VALUES ('1111111;22222222')
          ,('3333333;113113131')
          ,('776767676')
          ,('89332131;313131312;54545353')

    SELECT SL.[Value]
    FROM @RawData AS RD
    CROSS APPLY [fn_Split_List] ([Value])  as SL

SET NOCOUNT OFF
GO

The result is as the follow:

Value
1111111
22222222
113113131
3333333
776767676
313131312
54545353
89332131 

Anyway, the logic in the function is not complicated, so you can easily put it anywhere you need.

Note: There is not limitations of how many values you will have separated with ';', but there are length limitation in the function that you can set to NVARCHAR(MAX) if you need.

EDIT:

As I can see, there are some rows in your example that will caused the function to return empty strings. For example:

number;number;

will return:

number
number
'' (empty string)

To clear them, just add the following where clause to the statement above like this:

SELECT SL.[Value]
FROM @RawData AS RD
CROSS APPLY [fn_Split_List] ([Value])  as SL
WHERE LEN(SL.[Value]) > 0
gotqn
  • 42,737
  • 46
  • 157
  • 243