0

So I have a dataset that looks roughly like this:

01/02/2022, "bananas, apples and oranges", "accepted", "John"   
01/05/2022, "bananas and grapes", "accepted", "Mary"  
01/08/2022, "kale and bananas", "rejected", "John"

I need to transform it into:

01/02/2022, "bananas", "accepted", "John"
01/02/2022, "apples", "accepted", "John"
01/02/2022, "oranges", "accepted", "John"
01/05/2022, "bananas, "accepted", "Mary"

etc.

My problem is extracting the substrings for the types of fruit. The pattern is reliable in that the last delimiter is always " and ", and if there is another one before it, it's always ", ". So I can use CHARINDEX to find the position of the start of each word. But I struggle with finding the position of the end of the word. I could probably do it with a ton of CASE WHENs but is there an easier way?

(I asked this question earlier but I did a bad job of describing the problem then.)

  • While asking a question, you need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example): (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT T-SQL statements. (2) What you need to do, i.e. logic and your code attempt implementation of it in T-SQL. (3) Desired output, based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;). – Yitzhak Khabinsky Feb 08 '22 at 17:00
  • Are we looking at 2 columns here or 4? Include the actual table schema / columns to make it clear. Are double quotes part of the data?? – Stu Feb 08 '22 at 17:00
  • If I understand correctly, you want to split a delimited string into multiple rows. There are many questions in SO related to this, such as this [this one](https://stackoverflow.com/questions/57245618/sql-server-split-delimited-string-into-rows). – Aziz Feb 08 '22 at 17:04
  • I'd suggest you use a proper format such as JSON or XML, it would be far easier to parse in SQL. But if that's what you have then that's what you have. – Charlieface Feb 08 '22 at 20:08

1 Answers1

1

Given this data:

CREATE TABLE dbo.whatever
(
  d date, 
  fruits varchar(255), 
  status varchar(12), 
  name   varchar(32)
 );

INSERT dbo.whatever VALUES
('20220102', 'bananas, apples and oranges', 'accepted', 'John'),   
('20220105', 'bananas and grapes', 'accepted', 'Mary'),
('20220108', 'kale and bananas', 'rejected', 'John');

You want to split strings but based on two delimiters. So before you split them, you can just replace one with the other:

SELECT w.d, fruit = LTRIM(f.value), w.status, w.name
FROM dbo.whatever AS w 
CROSS APPLY 
STRING_SPLIT(REPLACE(w.fruits, ' and ', ', '), ',') AS f;

Results:

d fruit status name
2022-01-02 bananas accepted John
2022-01-02 apples accepted John
2022-01-02 oranges accepted John
2022-01-05 bananas accepted Mary
2022-01-05 grapes accepted Mary
2022-01-08 kale rejected John
2022-01-08 bananas rejected John
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490