0

I apologize if some of the code below is superfluous, but I am wanting to paint a picture of everything I am trying to do.

My intention is to

  1. Select an element from a json array called pickupLocations. There are many pickupLocations for every id_referral. See the CTE named raw_addresses below., and

  2. Use a string aggregation function (e.g. string_agg) to create a delimited field of addresses for each id_referral.

In other words, I want to be able to go from

#id_referral    pickup_addresses
#1  4265 Hillsdale Ave. NE, Grand Rapids, MI 49525
#1  3060 Cheney Ave. NE, Grand Rapids, MI 49525
#2  805 Kendalwood St. NE, Grand Rapids, MI 49505
#2  711 Edgewood St. NE, Grand Rapids, MI 49505

in my raw_addresses CTE

to

#id_referral    pickup_addresses
#1  4265 Hillsdale Ave. NE, Grand Rapids, MI 49525 | 3060 Cheney Ave. NE, Grand Rapids, MI 49525
#2  805 Kendalwood St. NE, Grand Rapids, MI 49505 | 711 Edgewood St. NE, Grand Rapids, MI 49505

in my final select.

However, I am now getting the following results

#id_referral    pickup_addresses
#1  4265 Hillsdale Ave. NE, Grand Rapids, MI 49525 | 4265 Hillsdale Ave. NE, Grand Rapids, MI 49525
#2  805 Kendalwood St. NE, Grand Rapids, MI 49505 | 805 Kendalwood St. NE, Grand Rapids, MI 49505

When using the code below.

 WITH raw_addresses AS (
         SELECT sr.id AS id_referral,
        --parsing a json array which has many pickupLocations for a single id_referral
            json_array_elements(sr."pickupLocations") -> 'pickupLocationAddress' AS pickup_address
           FROM "ServiceReferrals" sr
        )
--want to roll the pickup addresses into a single pipe-delimited field (willing work through an array too as shown with array_agg, but same problem there)
 SELECT raw_addresses.id_referral,
    string_agg(cast(pickup_address as varchar(100)), '|') AS pickup_addresses
   FROM raw_addresses
  GROUP BY raw_addresses.id_referral

For some reason, the string_agg function is repeating the first value for each id_referral. I have also tried agg_array and get the same behavior.

Any thoughts as to why would be much appreciated.

joemienko
  • 2,220
  • 18
  • 27
  • Just to double-check, your CTE for `raw_addresses` actually produces the output you've described (eg. you did a `select * from raw_addresses` to get that output)? – Gerrat May 25 '16 at 21:46
  • @gerrat, no - the CTE behaves as expected. I am getting a, `pickupLocationAddress` as an element from `pickupLocations` (many for each `id_referral`). The final result only comes from the final select. `string_agg` is behaving somewhat as expected - it's just repeating the "first" `pickupLocationAddress` (delimited by pipes) instead of delimiting each unique `pickupLocationAddress` – joemienko May 25 '16 at 21:56
  • If anyone is trying to implement this for analysis in R (which I am), I came across this answer which gets the job done with dplyr http://stackoverflow.com/a/28752857/2364325 – joemienko May 26 '16 at 00:52

1 Answers1

0

You could try using an intermediate CTE:

WITH raw_addresses AS (
     SELECT sr.id AS id_referral,
    --parsing a json array which has many pickupLocations for a single id_referral
        json_array_elements(sr."pickupLocations") -> 'pickupLocationAddress' AS pickup_address
       FROM "ServiceReferrals" sr
), transformed_addresses AS (
    SELECT id_referral, 
    cast(pickup_address as varchar(100)) AS pickup_address
    FROM raw_addresses
)
SELECT id_referral,
string_agg(pickup_address, '|') AS pickup_addresses
FROM transformed_addresses
GROUP BY id_referral
Gerrat
  • 28,863
  • 9
  • 73
  • 101
  • Thanks! I don't have remote access to this DB, but I'll check this out tmr. – joemienko May 26 '16 at 00:51
  • I am still getting the same problem when I run this code on my DB. That said, as shown here: http://sqlfiddle.com/#!15/e9839/3, your code does work as expected so I am accepting the answer. My specific problems must be a non-obvious problem with my specific DB setup or table that I am unable to reproduce at this time. As stated above, I have solved this problem with a non-SQL solution for now. – joemienko May 28 '16 at 20:53