self-joining in hive using different where conditions

Question

I'm working with Hive for the first time and it doesn't support pivot. So I'm trying to either do a series of self-joins or column expressions to get what need. I'm running into errors.

My first table (Employees) looks like this. There's four categorical groups in the role column that I'd like to turn into columns using group bys. One for C-Suite, one for D-Suite, etc. with the alias of the employees as the values.

alias	name	role	oversight	functional_area
jbob	Jo Bob	C-Suite	Finance	Orders
jndo	Jon Doe	D-Suite	Finance	Orders
jndo	Jon Doe	D-Suite	E-Commerce	Orders
tweb	Tim Webb	SME	Analytics	Fraud
kpow	Kat Power	C-Suite	Controls	Architecture
lguan	Lin Guan	D-Suite	Sales	Marketing
phar	Pat Harms	Manager	HR	HR

This is what I'm working with:

SELECT
    U.alias
    U.name
    U.role
    U.oversight
    U.functional_are
FROM employee U
LEFT OUTER JOIN 
   (SELECT 
       c.alias, c.oversight
   FROM employee c
   WHERE role = 'C-Suite' AND oversight <> 'N/A'
   GROUP BY oversight, alias)
ON U.alias = c.alias

And I'm getting error messages on the join when I try and just self-join c-suite. I also get an error when I try and put c.alias as c_suite in the select statement and not do a subquery. In R this is done with gather and spread, but I'm getting confused with Hive syntax.

A subquery needs an alias. (Excepting certain rogue DBMSs.) c is not known outside the subquery. (What do you think it represents & how/why?--Rhetorical.) PS Turn on all errors & warnings & logs & chop code down until you don't get the error then keep adding minimal code back until you get the error. Try to chop out all things that don't seem related to the error. Read the manual & introductions re the grammar/syntax. Understand what each subexpression value is & what names denote. PS If the syntax "confuses" you why aren't you researching/asking about that before writing code?--Rhetorical. — philipxy, Aug 03 '22 at 20:30
Debug questions require a [mre]--cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) — philipxy, Aug 03 '22 at 20:33
For clarity: Make table aliases very short & mnemonic, define one for every JOIN base table & subquery, & dot every column name with its alias. [When to use SQL Table Alias](https://stackoverflow.com/q/198196/3404097) etc. With no JOIN you must give still give an alias after a subquery but otherwise there's no need to give or dot with an alias. — philipxy, Aug 03 '22 at 21:40
Try - `SELECT U.alias U.name U.role U.oversight U.functional_are FROM employee U LEFT OUTER JOIN (SELECT c.alias, c.oversight FROM employee c WHERE role = 'C-Suite' AND oversight <> 'N/A' GROUP BY oversight, alias) c ON U.alias = c.alias ` — Koushik Roy, Aug 04 '22 at 04:37
Thank you, Koushik! That works. I also found a way using CASE. — newuser00989, Aug 04 '22 at 15:18

newuser00989 · Accepted Answer · 2022-08-04T15:36:38.003

For anyone else coming across this question, I also have a piece of proprietary software in addition to Hive that I'm using to make extracts and it cherrypicks SQL syntax. Some functions work, some don't. CASE worked for me. Not the cleanest query, but it works like pivot. I'm able to make joins after from statement.

SELECT 
CASE WHEN U.role = 'C-Suite' THEN U.alias 
ELSE 'NULL' END AS "C_Alias",
CASE WHEN U.role = 'D-Suite' THEN U.alias
ELSE 'NULL' END AS "D_Alias",
CASE WHEN U.role = 'SME' THEN U.alias
ELSE 'NULL' END AS "SME_Alias",
CASE WHEN U.role = 'Manager' THEN U.alias
ELSE 'NULL' END AS "Manager_Alias",
U.role
,U.name
,U.oversight
,U.functional_area
FROM employees U

self-joining in hive using different where conditions

1 Answers1