2

I am creating a package in SSIS, and want to convert a file with one large column into multiple columns.

I have a table containing several rows with a single column of raw data. The data was copied from a notepad file, and each row contains pipe delimiters to separate each column, but because it is a notepad file, each row is copied as one large column. I want to convert each column per row to multiple columns based on their start/end positions.

I tried using SSIS Derived Column Transformation with the SUBSTRING function, but the Data Type is automatically populated as text stream[DT_TEXT], and I get the following error:

Error at [Derived Column[113]]; The function “SUBSTRING” does not support the data type “DT_TEXT” for parameter number 1. The type of the parameter could not be implicitly cast into a compatible type for the function. To perform this operation, the operand needs to be explicitly cast with a cast operator.

Error at [Derived Column[113]]; Evaluating function 'SUBSTRING' failed with error code 0xC0047089.

Error at [Derived Column[113]]; Computing the expression "SUBSTRING[RawData],1,5)" failed with error code 0xC00470C5. The expression may have errors, such as divide by zero, that cannot be detected at parse time, or there may be an out-of-memory error.

Error at [Derived Column[113]]; The expression "SUBSTRING[RawData], 1,5)" on "Derived Column.Outputs[Derived Column Output].Coluns[Derived Column 1] is not valid

Error at [Derived Column[113]]; Failed to set property "Expression" on "Derived Column.Outputs[Derived Column Output].Columns[Derived Column 1]. "

When I review other Derived Column Transformation illustrations utilizing SUBSTRING with a file containing individual columns, I notice the Data Type is shown as DT_WSTR.

Do I need to convert to this Data Type? If so, how do I explicitly cast DT_TEXT data types to DT_WSTR with a cast operator in SSIS Derived Column Transformation?

Otherwise, how else could I handle this conversion?

Derived Column Name: EmployerNo
Derived Column:      Replace 'RawData'
Expression:          SUBSTRING( [RawData], 1, 5 )
Data Type:           text stream[DT_TEXT]

I expect the RawData column to be split up (converted) into 8 different columns based on their start and end positions.

Hadi
  • 36,233
  • 13
  • 65
  • 124
Sast77
  • 97
  • 1
  • 9

1 Answers1

1

Refering to SUBSTRING (SSIS Expression) documentation:

Returns the part of a character expression that starts at the specified position and has the specified length.

You have to convert DT_TEXT column to DT_STR/DT_WSTR before using Substring() function, you can do this using a Script Component, you can use a similar function:

string BlobColumnToString(BlobColumn blobColumn)
{
    if (blobColumn.IsNull)
        return string.Empty;

    var blobLength = Convert.ToInt32(blobColumn.Length);
    var blobData = blobColumn.GetBlobData(0, blobLength);
    var stringData = Encoding.Unicode.GetString(blobData);

    return stringData;
}

Or if the DT_TEXT length doesn't exceed the DT_STR length limit try using the following SSIS expression:

SUBSTRING( (DT_STR,1252,4000)[RawData], 1, 5 )
Hadi
  • 36,233
  • 13
  • 65
  • 124
  • Thanks for responding, Hadi. I will try both proposed solutions when I get to work on Monday and let you know my results. Just a question about the 1st solution. Where would I insert the "string BlobColumnToString(BlobColumn blobColumn)" expression, would I need to include a Script task, and is this C# coding? – Sast77 Aug 10 '19 at 20:52
  • You have to define this function within a script component and use it as following: `Row.StrColumn = BlobColumnToString(Row. BlobColumn) ;` – Hadi Aug 10 '19 at 20:56
  • Hi Hadi,I tried using your suggested solution of: SUBSTRING( (DT_STR,1252,4000) [RawData], 1, 5 ) but got a similar error as mentioned in by first post along with error code of 0xC00470AA. Then I replaced it with: SUBSTRING( (DT_WSTR,1252,4000) [RawData], 1, 5 ) and still got a similar error message, except that the second error message did not give me an error code. Also, I don't quite understand where to place the follow statement with the Script Component: Row.StrColumn = BlobColumnToString(Row. BlobColumn) ; Also, what is blockLength, blobColumn, blobData and stringData? – Sast77 Aug 12 '19 at 20:12
  • @SoniaScott try using a data conversion transformation to convert to dt_wstr then add a derived column to extract Substring . I think it should work – Hadi Aug 12 '19 at 21:45
  • Hi Hadi, I converted to DT-STR in a previous conversion step, and was successfully able to use the SUBSTRING in a Derived Column step to create several columns with Data Type DT_WSTR using the following statement: SUBSTRING([Copy of RawData],1,5). But I have another problem. The file is variable length and delimited by the pipe "|" delimiter. Is it possible to separate each column using Derived Column Transformation? – Sast77 Aug 14 '19 at 19:40
  • @SoniaScott you can use `TOKEN()` and `TOKENCOUNT()` functions – Hadi Aug 14 '19 at 19:52
  • @SoniaScott check this example https://stackoverflow.com/questions/57435110/treating-a-tab-delimted-column-as-a-bulk-insert-in-ssis/57436292#57436292 – Hadi Aug 14 '19 at 19:59
  • Thanks for suggesting the TOKEN() function. I used code: TOKEN([Copy of RawData],"\\|",1), and it worked perfectly for 1 file. For the other file, I converted the data from DT_TEXT to DT_STR but got message: Error: 0xC002F446 at 14a-Copy Rawdata to OLEDBTable, Dest [81]: An error occurred while setting up a binding for the "Empl Id" column. The binding status was "DT_TEXT". The data flow column type is "DBBINDSTATUS_UNSUPPORTEDCONVERSION". The conversion from the OLE DB type of "DBTYPE_IUNKNOWN" to the destination column type of "DBTYPE_VARCHAR" might not be supported by this provider. – Sast77 Aug 15 '19 at 21:17
  • @SoniaScott it would be better to ask a new question where you mention the message, screenshots. – Hadi Aug 15 '19 at 21:51
  • Hi Hadi, I see the problem now but don't know how to resolve it. The previous file had nvarchar fields, and worked perfectly when converted to Unicode string [DT_WSTR]. The new file has varchar fields and gives error "Columns "Empl ID" and "Empl id" cannot convert between Unicode and non-Unicode string data types." Right-clicking the task, going to Advanced Editor, then Input and Output Properties, External Columns, I notice the Data Types were already string[DT-STR]. When I tried changing them to Unicode string[DT_WSTR], they did the take the change. What can I do to resolve this error? – Sast77 Aug 16 '19 at 15:54
  • @SoniaScott you need to ask a new question and to provide an example of each file content and to show some screenshots of the package data flow task + error message. I cannot help you with just providing new error messages every time. – Hadi Aug 16 '19 at 15:59
  • OK, I will start a new question. Just need to figure out how to include screen shots :). Thanks again. – Sast77 Aug 16 '19 at 16:09
  • @Sast77 check the following meta link to read on how to add screenshots https://meta.stackoverflow.com/questions/344851/how-do-you-add-a-screenshot-image-to-your-stack-overflow-post – Hadi Aug 16 '19 at 16:11
  • @Sast77 also if my answers solved the problem mentioned in this question, you can [mark it as accepted to close this one](https://www.stackoverflow.com/tour) – Hadi Aug 16 '19 at 16:12
  • I got it to work. Thanks you. First I had to understand the difference between Unicode (DT_WSTR or nvarchar) and non-Unicode (DT_STR or varchar). I had to do 5 dataflow steps in single ControlFlow to achieve copy of the varchar file. Step 1: OLE DB RawData source table. Step 2: Convert all columns from DT_TEXT to DT_STR to get DataType DT_WSTR in Derived Column dataflow. Step 3 Derived Column: Use Expression "TOKEN([Copy of RawData],"\\|",1)" to separate delimited columns. Step 4: Convert from Unicode DT_WSTR to non-Unicode DT_STR. Step 5: Map to (non-Unicode) OLE DB Destination Table. – Sast77 Aug 16 '19 at 18:04
  • @Sast77 then no more need for a new question. Congrats – Hadi Aug 16 '19 at 18:06