I am trying to get unique source, medium, campaign using STRING_AGG function partitioning by fullVisitorId and ORDER BY visitNumber. However, I am getting a lot of repeated fields in Source, medium and campaign using STRING_AGG.
So I thought of using DISTINCT but it doesn't work with STRING_AGG.
Any advice appreciated.
Code:
SELECT
CAST(concat(SUBSTR(date,0,4), '-', SUBSTR(date,5,2),'-',SUBSTR(date,7,2)) as date ) as date,
fullVisitorId,
trafficSource.keyword,
trafficSource.adContent,
STRING_AGG(channelGrouping), ' > ') OVER (PARTITION BY fullVisitorId ORDER BY visitNumber) AS channelPath,
STRING_AGG(trafficSource.medium), ' > ') OVER (PARTITION BY fullVisitorId ORDER BY visitNumber ) AS medium_Path,
STRING_AGG(trafficSource.source), ' > ') OVER (PARTITION BY fullVisitorId ORDER BY visitNumber) AS source_Path,
STRING_AGG(trafficSource.campaign), ' > ') OVER (PARTITION BY fullVisitorId ORDER BY visitNumber) AS campaign_Path
FROM
`tfa-big-query.74006564.ga_sessions_*`,UNNEST(hits) as hit
WHERE _table_suffix BETWEEN '20180801' and '201808015'
The results in the STRING_AGG column (channelgrouping, medium and source is the occurrence of repetitive fields which can be seen as follows:
Actual results:
Organic Search > Organic Search > Organic Search < Other < Other < Direct < Owned < email < email < (none) < (none)
Expected Output:
Organic Search < Other < Direct < Owned < email < (None)