1

I am struggling to split the data in the column into multiple columns. I have data of names of customers and the data needs cleaning as there can be duplicates and I also need to set up new standards for the future data. I have been able to successfully split the first two words in the string but not being able to split further data.

I only have read permissions. So I cannot create any functions.

For example:

Customer name: Illinois Institute of Technology

My query will only fetch "Illinois" in one column and "Institute of Technology" in other column. Considering delimiter as 'space', I am looking to separate each word into separate columns. I am not sure how to identify the 2nd space and further spaces.

I have also tried using 'parsename' function, but I feel it will create more difficulty in cleaning the data.

select name, 
left (name, CHARINDEX(' ', name)) as f,
substring(name, CHARINDEX(' ', name)+1, len(name)) as s
from customer
Suraj Kumar
  • 5,547
  • 8
  • 20
  • 42
Saloni
  • 11
  • 1
  • 2

2 Answers2

1

EDIT: This only works for SQL Server 2016 and above. OP has SQL Server 2014.

There isn't really a good way to do this, but here's one method that might work for you, modified from an example here:

create table #customer (id int, name nvarchar(max))

insert into #customer
values  (1, 'Illinois Institute of Technology'), 
        (2, 'The City University of New York'), 
        (3, 'University of the District of Columbia'), 
        (4, 'Santa Fe University of Art and Design')

;   
with c as(
select id, name
      ,value
      ,row_number() over(partition by id order by (select null)) as rn
from #customer
cross apply string_split(name, ' ') as bk
)
select id, name
      ,[1]
      ,[2]
      ,[3]
      ,[4]
      ,[5]
      ,[6]
from c
pivot(
    max(value)
    for rn in([1],[2],[3],[4],[5],[6])  
) as pvt

drop table #customer

Notice a few things:

  1. You have to explicitly declare columns in the output. You could create some overly-complex dynamic SQL that would generate as many column names as you need, but that makes it harder to fix issues and make modifications, and you probably won't get the same query optimisations.
  2. Because of (1), you will just end up dropping words if there are too many to fit the number of columns you've defined. See the last example, id=4.
  3. Beware of other methods that might not keep your words in order, or that skip out duplicate words, eg. "of" in the example id=3.
pcdev
  • 2,852
  • 2
  • 23
  • 39
  • I am using sql server 14.0. – Saloni Jun 04 '19 at 10:09
  • String_split function is available in 16.0 – Saloni Jun 04 '19 at 10:09
  • Ah, well you didn't tag the question with your SQL Server version, how was I to know? :) I'll leave this answer in case it helps someone else with a similar question. – pcdev Jun 04 '19 at 10:27
  • As an alternative, have a look at this question and its answers: https://stackoverflow.com/q/5722700/2869344 – pcdev Jun 04 '19 at 10:39
  • Well, I am new to stack overflow. I might have missed mentioning it. But the link provided by you was very much helpful. Thank you :) – Saloni Jun 04 '19 at 17:21
0

You don't mention what you plan to do with the data once you retrieve it. Since you only have read permissions you can't store it in a table. Something you may not have thought of is to create a local database where you do have write permissions and do your work there. The easiest way would be to get a copy of the database, but you could also access the readonly database using fully qualified names.

As for your string-splitting needs, I can point you to a great way of splitting strings created by a gentleman named Jeff Moden. You can find an article discussing it as well as a link to code here:

Tally OH! An Improved SQL 8K “CSV Splitter” Function

While it's a very informative read, inasmuch as much of it is about performance testing and such, you might want to skip much of it and go straight for the code, but try to pick out the stuff which discusses functionality and read that because the code will strike the uninitiated as unconventional at best.

The code creates a function but because you don't have permission to do that you will have to remove the meat from the function and use it directly.

I'll try to provide a little overview of the approach to get you started.

The heart of the approach is a Tally Table. If you are unfamiliar with that term (it is also called a Numbers table), it is basically a table in which every row contains an integer and the rows are basically a set of all integers in some range, usually pretty large. So how does a Tally Table help with splitting strings? The magic happens by joining the tally table to the table containing the strings to be split and using the where clause to identify the delimiters by looking at 1-character substrings indexed by the tally table numbers. The natural set-based operations of SQL Server then search for all of your delimiters in one go and your select list then extracts the substrings bracketed by the delimiters. It's really quite clever and very fast.

When you get into the code, the first part of the function may look very strange (because it is), but it is necessary since you only have read rights. It is basically using SQL Server's Common Table Expression (CTE) functionality to build an internal tally table on the fly using some ugly logic which you don't really need to understand (but if you want to dig in, it's clever even if it is ugly). Since the table is only local to the query it won't violate your readonly permissions.

It also uses CTEs to represent the starting index and length of the delimited substrings so the final query is pretty simple, yielding rows with a row number followed by a string split from the original data.

I hope this helps you with your task -- it really is a nice tool to have in your toolkit.

Edit: I just realized that you wanted your output to be in separate columns as opposed to rows. That's quite a bit more difficult since each of the columns you are splitting might produce a different number of strings and also, your columns will need names. If you already know the column names and know the number of output strings it will be easier but still tricky. The row data from the splitter could be tweaked to provide an identifier for the row where the data originated and the row numbers might help with creating arbitrary column names if you need them, but the big problem is that with only read privileges you will find processing things in steps rather tricky -- CTEs can be employed even further for this but your code will likely get rather messy unless the requirements are pretty simple.

Les N
  • 11
  • 4