0

I am working with a legacy hive table where a field "active_countries" stores a list of values

e.g.

active_countries = [US, CA, MX]

How do I group on the "active_countries", and do a count(*) while tracking individual entries.

e.g. If my table contains

date | active_countries
Mar-12 | [US, CA, MX]
Mar-13 | [US, CA]

When I do a GROUP BY date, active_countries, I want to see something of the following format

date | active_country
Mar-12 | US
Mar-12 | CA
Mar-12 | MX
Mar-13 | US
Mar-13 | CA
user462455
  • 12,838
  • 18
  • 65
  • 96
  • First you would need this [Turning a Comma Separated string into individual rows](http://stackoverflow.com/questions/5493510/turning-a-comma-separated-string-into-individual-rows) after that grouping should be easy –  Apr 04 '14 at 00:06

1 Answers1

1

Use Hive's explode UDF to turn the array into multiple rows with one value each.

See the explode documentation on https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

Carter Shanklin
  • 2,967
  • 21
  • 18
  • Thanks. I get a UDFException when I try to do "explode" on that column. Added a new question for that http://stackoverflow.com/questions/22873020/hive-lateral-view-explode-udfargumentexception – user462455 Apr 04 '14 at 21:09