string_agg substitute used in Derive transformation in Azure data flow expression language

Question

How do I convert this sql code "left(string_agg(zationClient, '; ') within group (order by zationClient) , 100)" as a Transformation in data factory specifically in azure data flow expression language?

I am expecting an output like this with delimiter "NAFDQ-Trulife; EAQVN - Trulife"

I understand that string_agg is not supported in data flow expression language.

I tried substring(concat(zationClient, '; '), 1, 128) but it doesn't give the required result.

Can you give sample column values of zationClient column? – Saideep Arikontham Jun 05 '23 at 12:15 — Saideep Arikontham, Jun 05 '23 at 12:15

score 0 · Accepted Answer · answered Jun 06 '23 at 06:26

You need to use a combination of derived column transformation, sort transformation and an aggregate transformation to get the desired result. The following is the starting data that I have:

enter image description here

I have added a new column tp with a static value 1. This can be used to group the data as required.

enter image description here

Now sort the data on the required column using sort transformation with configurations similar to the ones shown below;

enter image description here

Now using the aggregate transformations, to group by the tp column. In the aggregates tab, use the required column with the following expression to get the desired results.

left(substring(toString(reduce(collect(con), '', #acc+';'+#item, #result)),2),100)

enter image description here

The following is the entire Dataflow JSON:

{
    "name": "dataflow1",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    },
                    "name": "source1"
                }
            ],
            "sinks": [
                {
                    "name": "sink1"
                }
            ],
            "transformations": [
                {
                    "name": "derivedColumn1"
                },
                {
                    "name": "sort1"
                },
                {
                    "name": "aggregate1"
                }
            ],
            "scriptLines": [
                "source(output(",
                "          id as short,",
                "          con as string",
                "     ),",
                "     allowSchemaDrift: true,",
                "     validateSchema: false,",
                "     ignoreNoFilesFound: false) ~> source1",
                "source1 derive(tp = 1) ~> derivedColumn1",
                "derivedColumn1 sort(asc(con, true),",
                "     caseInsensitive: true) ~> sort1",
                "sort1 aggregate(groupBy(tp),",
                "     con = left(substring(toString(reduce(collect(con), '', #acc+';'+#item, #result)),2),100)) ~> aggregate1",
                "aggregate1 sink(validateSchema: false,",
                "     skipDuplicateMapInputs: true,",
                "     skipDuplicateMapOutputs: true,",
                "     store: 'cache',",
                "     format: 'inline',",
                "     output: false,",
                "     saveOrder: 1) ~> sink1"
            ]
        }
    }
}

You are amazing, this worked. Thanks – datainstints Jun 07 '23 at 01:54 — datainstints, Jun 07 '23 at 01:54

string_agg substitute used in Derive transformation in Azure data flow expression language

1 Answers1