1

I have a Redshift table with an example row with the following structure:

id                url
12345             http://www.things.com/details/?foo=hello&bar=world&baz=John+Smith
45678             http://www.things.com/details/?foo=hello&bar=america&booz=Howard+Jones&other_field=Portugal

I would like to extract everything after the ? in the url and for every &, add the value on the left of the = to a keys column and the value on the right to a values column. There is no determinate number of &'s in a given URL. The desired output is as follows:

id               key          value
12345            foo          hello
12345            bar          world
12345            baz          John+Smith
45678            foo          hello
45678            bar          america
45678            booz         Howard+Jones
45678            other_field  Portugal

My solution right now is to pick a reasonably high number and to write a Python script that writes the same query with a new UNION ALL for each integer. I parse out the desired fields using SPLIT_PART(SPLIT_PART(SPLIT_PART(url, '?', 2), '&', {i}), '=', 1) and SPLIT_PART(SPLIT_PART(SPLIT_PART(url, '?', 2), '&', {i}), '=', 2) during each iteration.

Brandon Sherman
  • 673
  • 1
  • 8
  • 25

1 Answers1

0

It's a duplicate of a few other questions: splitting rows in Redshift , Split values over multiple rows in RedShift.

I don't think it's possible in Redshift easily, as AFAIK it doesn't have functions that convert a single row to multiple rows.

The related questions I mentioned also provide a few helpful tricks that might help. I think the best is one is joining with a table of sequence numbers (without any join predicates) and using that number for SPLIT_PART, possibly with a filter to eliminate rows that do not match.

Some other systems have functions for that, e.g. Postgres has regexp_split_to_array and regexp_split_to_table, and Snowflake has SPLIT + FLATTEN (disclaimer - I work on this system).

Community
  • 1
  • 1
Marcin Zukowski
  • 4,281
  • 1
  • 19
  • 28