1

I have a column with some single values and comma separated values and I want to replace them by matching to a description table.

I have a table that looks like this:

enter image description here

Text representation:

Length    Directory                                                                Extensions                Extension Description Type
6684672   Y:\Data\Retail\WalmartMX\Development\Curt.Wolfe\ChristiesAnalysisTool    accdb accdb               Access Database Development/DB
2002944   Y:\Data\Retail\WalmartMX\Development\SourceCode\WalmartMxDecipher\APPS   ACCDB accdb               Access Database Development/DB
5312512   Y:\Data\Retail\WalmartMX\Development\SourceCode\AnalysisTool             accdb accdb               Access Database Development/DB
30986240  Y:\Data\Retail\WalmartMX\Utilities\Tracking\Stats\BAK                    accdb accdb               Access Database Development/DB
112917071 Y:\Data\Retail\WalmartMX\Deploy\Development\SourceCode\WalmartMxDecipher accdb,accde,zip           NULL  NULL  NULL
139053182 Y:\Data\Retail\BQ\Utilities\EMT                                          accdb,bat,docx,laccdb,txt NULL  NULL  NULL
32116006  Y:\Data\Retail\WalmartMX\Utilities\Tracking\Stats                        accdb,bat,laccdb,sql,xlsx NULL  NULL  NULL

I need to replace the values in the extension field or better yet create a new field that has the listings with description from another table.

  1. Match on extension (could be comma separated need them all)
  2. Provide a new field with the description
    a. New field should (if extension is accdb) be Access Database (Match from another table)
    b. If there is csv it should read (if extensions are txt,xml) CSV File, TextFile (again match from another table on the extension field)

Expected output: enter image description here

Code so far:

SELECT *
  FROM [SandboxVinny].[dbo].[FinalDirectoryListing] FDL
  left JOIN dbo.SourceExtensions SE
    on SE.Extension = FDL.Extensions

Obviously using this I get null values for any row that has multiple file extensions in the field.

Hogan
  • 69,564
  • 10
  • 76
  • 117
VinnyGuitara
  • 605
  • 8
  • 26

2 Answers2

2

How to solve:

  1. You need a unique identifier per row.

  2. Create another table with rowid and extension column

  3. Populate that table with one to many for each extension (so if the rowid goes from 1 to 7 in your example that table would look like this:

    ID Extension 1 accdb 2 ACCDB 3 accdb 4 accdb 5 accdb 5 accde 5 zip 6 accdb 6 bat 6 docx 6 laccdb 6 txt 7 accdb 7 bat 7 laccdb 7 sql 7 xlsx

  4. Now your data is relational -- you can do queries and join to this new table and then join to your "other table"

Hogan
  • 69,564
  • 10
  • 76
  • 117
1

It's hard to guess what you want to do, even though you posted some (partial) screenshots.

As Hogan mentioned, the data is hard to query because its structure is not relational. It is possible however. Like this:

SELECT *,
 STUFF((SELECT ', ' + se.Description AS [text()]
        FROM SourceExtensions AS se
        WHERE ',' + fdl.Extensions + ',' LIKE '%,' + se.Extension + ',%'
        FOR XML PATH('')), 1, 2, '') AS Description
FROM FinalDirectoryListing AS fdl

This complex FOR XML trick with a subquery is the SQL Server way of aggregating strings by concatenation. The LIKE finds the separate extensions in the comma separated list. The STUFF(..., 1, 2, '') finally removes the before the first element.

Is this what you are looking for?

Community
  • 1
  • 1
Michel de Ruiter
  • 7,131
  • 5
  • 49
  • 74
  • This really does not help since he needs to parse a comma separated list and use it to join -- your example allows aggregation of a group -- basically you are showing the reverse. – Hogan Aug 31 '16 at 15:30
  • I guess you understand the OP's question better. This is only how I read it. The `WHERE ',' + fdl.Extensions + ',' LIKE '%,' + se.Extension + ',%'` part does parse the comma separated list and also works in a `JOIN`. – Michel de Ruiter Sep 01 '16 at 07:40
  • That won't work, you also have to account for comma only before, comma only after, no comma. Many cases. better to do what I said and make a process to make a list that you can add an index to. – Hogan Sep 01 '16 at 14:28
  • That's what the `',' +` and `+ ','` are for. Try it, works fine. Sometimes one can't change the data generation process. – Michel de Ruiter Sep 02 '16 at 08:30
  • ok I see what you did... actually kinda cute. for small data sets it would work well. +1 – Hogan Sep 02 '16 at 17:57