0

I have a process where I get data in from one resource (source) and output into a cleaner output file (destination). I use a translation or mapping table (map) to categorize the data from the source before it gets put into the destination. So it is something like this in a process flow:

Source --> Map --> Destination.

My problem is the source file has a field that, at times, contains numerous values separated by a semicolon. For example, the field (called Type) might contain values Lemon; Orange; Grape.

In my Map table I only have certain values that I want to map between source and destination. In the above example, I would only want to map the Lemon and Orange values to the destination. In my Map table I have this:

| Source  | Destination |
-------------------------
| Lemon   | Citrus      |
| Orange  | Citrus      |

I have these values separated in the Map because in the source table the Type field can contain only 'Lemon', 'Orange', 'Grape', or multiple values like mentioned above ('Lemon; Orange; Grape').

Is it possible to extract just the Lemon and Orange values from the source when there are multiple values in the Type field through a join or some other way? I do not want to put in every combination from the Type field from my Source table into the Map table and the 'Type' values order changes consistently.

My goal is to be able to get each value from the 'Type' field to map from the source to the proper field in the destination even when there are multiple types.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
user7577070
  • 97
  • 3
  • 9
  • 5
    The biggest issue here is that you are storing multiples values in a delimited string. This violates 1NF and is painful to work with. Ideally you should normalize this. But if that isn't possible you will need to split your string. The technique for this varies on the version of sql server you are using. – Sean Lange Sep 17 '18 at 15:32
  • 4
    The whole problem here is solved by fixing your schema so you don't store delimited data in a single field. Really, unless that is fixed, you are going to experience major pain every time you attempt to interact with this column – JNevill Sep 17 '18 at 15:32
  • @SeanLange - Yes the problem with the source data is I do not have control over how it is created. I import this data into the source table. – user7577070 Sep 17 '18 at 16:16
  • @JNevill see above comment where I do not have control over over the way the data is stored, it is imported. – user7577070 Sep 17 '18 at 16:17
  • 1
    Understood. I think the best route here is to have a view, or downstream table where the records are properly exploded/split into multiple records (one for each delimited piece of data in each source record's delimited column). From that result you can join as normal to these other tables. [A good example of how to do that using a Recursive CTE is here](https://stackoverflow.com/questions/5493510/turning-a-comma-separated-string-into-individual-rows). If the original table is updated on a regular cadence (nightly) then you can run this downstream logic afterwards to have more normalized data. – JNevill Sep 17 '18 at 16:22
  • 1
    If you can go that route, then the resulting table can be indexed on the new field as well to help this join be more performant. You can treat your original table as more of a *step* in the ETL to this final table. – JNevill Sep 17 '18 at 16:26

0 Answers0