1

I have searched many places and rarely use forums for posting... So I believe I have a rare issue here. I have a csv file that contains a one to many relationship. In the form of say the first 10 fields contain the one record. This continues to a blank field and then within each additional field is another delimiter being a pipe that has fields within it. This is actually a log file of events that occured relating to a single call.

For example.

ID;MACHINEID;ANI;;COMMAND|ARGUMENTS|20120112.06:15:32

The bolded text can repeat many times or a few. This is the many side of the join on SQL. I need to get this back into SQL. I am thinking SSIS Script Block. Where I loop and add to the many table with the ID from the one side. Splitting the internal fields with the pipe as delimiter. I just wonder if anyone has seen this before and has a simple solution that maybe I don't know about. I would imagine that this would have been better created in XML but alas that is not the case.

Any help is appreciated. I will even take criticism if it is mostly constructive. Thanks so much in advance.

To show table makeup

CREATE TABLE [dbo].[tblIVRCalls](
    [CALLID] [char](50) NOT NULL,
    [MACHINEID] [char](50) NOT NULL,
    [CHANNEL] [char](50) NOT NULL,
    [DNIS] [char](50) NOT NULL,
    [ANI] [char](50) NOT NULL,
    [STARTTIME] [smalldatetime] NOT NULL,
    [TRUNK] [char](50) NOT NULL,
    [APPLICATION] [char](50) NULL,
    [PLANID] [char](50) NULL,
    [DERIVEDID] [char](50) NULL,
    [TOTALTIME] [smalldatetime] NOT NULL,
 CONSTRAINT [PK_tblIVRCalls] PRIMARY KEY CLUSTERED 

CREATE TABLE [dbo].[IVRCallActions](
    [pk] [bigint] IDENTITY(1,1) NOT NULL,
    [fkCALLID] [char](50) NOT NULL,
    [SequenceNumber] [bigint] NOT NULL,
    [Command] [char](50) NOT NULL,
    [Arguments] [varchar](max) NOT NULL,
    [ExecutionTime] [datetime] NOT NULL,
 CONSTRAINT [PK_IVRCallActions] PRIMARY KEY CLUSTERED 
dstigue
  • 31
  • 10
  • SSIS works in a row-wise way, so don't expect to solve this with SSIS. It should be processed with a simple C# program – JotaBe May 02 '12 at 15:43
  • 1
    @JotaBE: How would you implement this in C# in a way that didn't iterate over the rows? – JAQFrost May 02 '12 at 16:11
  • What do your tables look like? For your given example, what's the expected outcome? 1 row in table1, N rows in table2? Will information always be in three part blocks or is that variable too? – billinkc May 02 '12 at 17:23
  • [link](http://sqlblog.com/blogs/andy_leonard/archive/2010/05/18/ssis-design-pattern-loading-variable-length-rows.aspx) Trying to use this method to use C# code and choose it's output based on record. I am thinking for each record running an A output for non repeating and a B record for repeating with the foreign key captured in a variable. – dstigue May 02 '12 at 20:00

3 Answers3

0

One option is to import the data into a staging table, where the repeated element is stored as a varchar(max) (or nvarchar(max) for the purists).

Then do the following:

  1. Insert the non-repeating fields into table 1.
  2. Capture the identity of the row just inserted, using @@Identity
  3. Loop through the repeating field, extracting first the rows (defined by some delimiter) and then the components of the row (defined by "|").
  4. Insert each row into table 2, with the identity from (2).

I admit that step (3) is not the easiest thing. However, if you write a stored procedure and use the split function (http://stackoverflow.com/questions/2647/split-string-in-sql), then it is feasible.

In my experience, complex string manipulations work okay up through hundreds of megabytes. If you need to process multiple gigabytes at a time, then you'll probably want custom code.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I know the split function from VBA and C# I would imagine it's similar. I will take a look at the link you provided. How do you nested loop in SQL. I am not that familiar. Would it be a recursive query like a CTE. I am not that familiar with CTEs but I imagine they sound like they fit the bill. – dstigue May 02 '12 at 19:40
0

I'd recommend creating a console program (with C#, vbs or whatever you can use) to get 2 files from the original file.

Then you could process both files in SSIS, which is very good at managing empty fields, change data types and so no.

The program would simply read the original file line by line, would process it to get the parent row and the related children rows, and write the parents in one file and the children in the other. Those files would be the entry for the SSIS package.

What not to do:

  • Making this kind of process is much more difficult in T-SQL, which hasn't many functions that allow to process strings.
  • Making this in SSIS is not possible. There's no component that can decompose a row in several rows. Not to mention that it should have to produce two different kind of rows. (A custom SSIS component would do it, but it's not worth the effort)
JotaBe
  • 38,030
  • 8
  • 98
  • 117
  • C# can be added to SSIS via Script Blocks. [SSIS Design Pattern: Loading Variable-Length Row](http://sqlblog.com/blogs/andy_leonard/archive/2010/05/18/ssis-design-pattern-loading-variable-length-rows.aspx) shows a little of how this can be done. I will have to loop and create. I was hoping there was an easier way but this seems to be a complex problem. – dstigue May 03 '12 at 14:20
  • Surprising. That's absolutely true. I couldn't ever find the way to create several buffer outputs in a sibgle script component. great solution! – JotaBe May 04 '12 at 18:48
0

Based upon the information found at SSIS Design Pattern: Loading Variable-Length Rows I was able to create the below procedure. The SSIS Script Block can do anything you can do with C# and can be installed to SQL Server to run at set intervals. Created 2 outputs for processing. You fill the Buffer for each one through this script. So one flat file in with one column. 2 outputs and this script in between.

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        IVRCallsBuffer.AddRow();
        string[] splitEntireRow = null;
        var byteRow = Row.Column0.GetBlobData(0,(int)Row.Column0.Length);
        var entireRow = System.Text.Encoding.ASCII.GetString(byteRow);
        splitEntireRow = entireRow.Split(';');
        int fieldPosition = 0;
        string CallID = "";
        while (splitEntireRow[fieldPosition] != "")
        {
            var splitIVRCall = splitEntireRow[fieldPosition].Split('=');
            switch (splitIVRCall[0])
            {
                case "CALLID":
                    IVRCallsBuffer.CALLID = splitIVRCall[1];
                    CallID = splitIVRCall[1];
                    break;
                case "MACHINEID":
                    IVRCallsBuffer.MACHINEID = splitIVRCall[1];
                    break;
                case "CHANNEL":
                    IVRCallsBuffer.CHANNEL = splitIVRCall[1];
                    break;
                case "DNIS":
                    IVRCallsBuffer.DNIS = splitIVRCall[1];
                    break;
                case "ANI":
                    IVRCallsBuffer.ANI = splitIVRCall[1];
                    break;
                case "STARTTIME":
                    IVRCallsBuffer.STARTTIME = DateTime.ParseExact(splitIVRCall[1], "yyyyMMdd.HH:mm:ss.fff", System.Globalization.CultureInfo.CurrentUICulture);
                    break;
                case "TRUNK":
                    IVRCallsBuffer.TRUNK = splitIVRCall[1];
                    break;
                case "APPLICATION":
                    IVRCallsBuffer.Application = splitIVRCall[1];
                    break;
                case "PLANID":
                    IVRCallsBuffer.PLANID = splitIVRCall[1];
                    break;
                case "DERIVEDID":
                    IVRCallsBuffer.DERIVEDID = splitIVRCall[1];
                    break;
                case "TOTALTIME":
                    IVRCallsBuffer.TOTALTIME = DateTime.ParseExact(splitIVRCall[1],"mm:ss.fff",System.Globalization.CultureInfo.CurrentUICulture);
                    break;   
            }
            fieldPosition++;
        }
        fieldPosition++;
        uint sequence = 1;
        while (splitEntireRow.GetUpperBound(0) -1 > fieldPosition)
        {
            IVRCallActionsBuffer.AddRow();
            var splitIVRCallAction = splitEntireRow[fieldPosition].Split('|');
            IVRCallActionsBuffer.fkCALLID = CallID;
            IVRCallActionsBuffer.SequenceNumber = sequence;
            IVRCallActionsBuffer.Command = splitIVRCallAction[0];
            IVRCallActionsBuffer.Arguments = splitIVRCallAction[1];
            IVRCallActionsBuffer.ExecutionTime = DateTime.ParseExact(splitIVRCallAction[2],"yyyyMMdd.HH:mm:ss.fff", System.Globalization.CultureInfo.CurrentUICulture);
            fieldPosition++;
            sequence++;
        }

    }
}
dstigue
  • 31
  • 10