10

Is there any option to create a custom Profile Request for SSIS Data Profiling Task?

At the moment there are 5 standard profile requests under SSIS Data Profiling task:

  1. Column Null Ratio Profile Request
  2. Column Statistics Profile Request
  3. Column Length Distribution Profile Request
  4. Column Value Distribution Profile Request
  5. Candidate Key Profile Request

I need to add another one (Custom one) to get summary of all numeric values.

Thanks in advance for your helps.

Hadi
  • 36,233
  • 13
  • 65
  • 124
Barsham
  • 749
  • 8
  • 30
  • 2
    In SSIS, i think you should write your own queries using Execute SQL Task – Yahfoufi Aug 10 '18 at 09:49
  • Thanks @Yahfoufi for your comment.Although the question was specific to SSIS Data Profiling Task. – Barsham Aug 12 '18 at 00:35
  • 1
    but SSIS data profiling task has many limitations, it is designed for some specific profiles only. For custom profiles you need to go for some workaround – Yahfoufi Aug 13 '18 at 11:22
  • 1
    i don't think you will get another answer than these answers because Data profiling Task has a specific and limited job. – Yahfoufi Aug 16 '18 at 08:12

2 Answers2

7

Based on this Microsoft Documentation, SSIS Data profiling Task has only 5 main profiles (listed on your question) and there is no option to add a custom profile.

For a similar reason, i will create an Execute SQL Task to achieve that, you can use the aggregate functions you need and ISNUMERIC function in the where clause :

SELECT MAX(CAST([Column] AS BIGINT)) -- Maximum value
       ,MIN(CAST([Column] AS BIGINT)) -- Minimum value
       ,COUNT(Column) -- Count values
       ,COUNT(DISTINCT [Column]) -- Count distinct values
       ,AVG(CAST([Column] AS BIGINT)) -- Average
       ,SUM(CAST([Column] AS BIGINT)) -- Sum
FROM TABLE
WHERE ISNUMERIC([Column]) = 1
Hadi
  • 36,233
  • 13
  • 65
  • 124
2

I think what you want to do here is create a computed column that is populated with your source column only if IsNumeric(SourceColumn) = 1.

Then create a profile task using Column Value Distribution Profile Request on the computed column, with ValueDistributionOption set to AllValues.

Edit:

To further clarify, the computed column doesn't have to be a task in SSIS, although that's how I was thinking about it when I came up with my answer. You could ALTER the table you want to profile, adding the computed column, and then create the Profile Task as I explained above.

I was also under the assumption that you wanted to profile the values of a single column. If you're wanting to do this for multiple columns, or need to profile the summary values aggregated from details records, then this answer may not be the best solution.

digital.aaron
  • 5,435
  • 2
  • 24
  • 43
  • Can you please elabourate more? I have worked with all the component that you are talking about although I don't know how you are joing them together. Computed column on a details table to calculate summary of entire table for each single field? Is this even practical? – Barsham Aug 12 '18 at 00:45
  • @Barsham if you decide to work with this answer, create a persisted computed column with a case condition https://blog.sqlauthority.com/2016/04/27/sql-server-computed-column-conditions-case-statement/ – Hadi Aug 12 '18 at 08:06
  • @Barsham `ALTER TABLE [My_Table] ADD OnlyNumCol AS CAST(Case When ISNUMERIC([Column]) = 1 Then [Column] ELSE NULL END AS BIGINT) PERSISTED` – Hadi Aug 12 '18 at 08:08
  • @Barsham then you can use SSIS profile Task. But if you are asking for a custom profile (not using current profiles only on numeric values) then you should go the `Execute SQL Task` as i mentioned in my answer – Hadi Aug 12 '18 at 08:10
  • 3
    It is not a good idea to add a Computed column to the table for profiling purposes only. So if you have many criteria you need to add many computed columns. It decreases the performance or it increase the size. I think it is better to go for `Execute SQL Task` – Yahfoufi Aug 13 '18 at 11:25
  • Even if i don't recommend using computed columns, but i think that the OP is looking for workarounds. So the answer doesn't deserve down-voting. +1 – Hadi Aug 15 '18 at 18:06
  • 1
    My thinking here was that this profile would be a one-off type of operation, and that using a computed column would be the quickest and easiest way to circumvent the limited options built into SSIS. – digital.aaron Aug 15 '18 at 22:26