I'm working with Hive for the first time and it doesn't support pivot. So I'm trying to either do a series of self-joins or column expressions to get what need. I'm running into errors.
My first table (Employees) looks like this. There's four categorical groups in the role column that I'd like to turn into columns using group bys. One for C-Suite, one for D-Suite, etc. with the alias of the employees as the values.
alias | name | role | oversight | functional_area |
---|---|---|---|---|
jbob | Jo Bob | C-Suite | Finance | Orders |
jndo | Jon Doe | D-Suite | Finance | Orders |
jndo | Jon Doe | D-Suite | E-Commerce | Orders |
tweb | Tim Webb | SME | Analytics | Fraud |
kpow | Kat Power | C-Suite | Controls | Architecture |
lguan | Lin Guan | D-Suite | Sales | Marketing |
phar | Pat Harms | Manager | HR | HR |
This is what I'm working with:
SELECT
U.alias
U.name
U.role
U.oversight
U.functional_are
FROM employee U
LEFT OUTER JOIN
(SELECT
c.alias, c.oversight
FROM employee c
WHERE role = 'C-Suite' AND oversight <> 'N/A'
GROUP BY oversight, alias)
ON U.alias = c.alias
And I'm getting error messages on the join when I try and just self-join c-suite. I also get an error when I try and put c.alias as c_suite in the select statement and not do a subquery. In R this is done with gather and spread, but I'm getting confused with Hive syntax.