-1

I'm working with Hive for the first time and it doesn't support pivot. So I'm trying to either do a series of self-joins or column expressions to get what need. I'm running into errors.

My first table (Employees) looks like this. There's four categorical groups in the role column that I'd like to turn into columns using group bys. One for C-Suite, one for D-Suite, etc. with the alias of the employees as the values.

alias name role oversight functional_area
jbob Jo Bob C-Suite Finance Orders
jndo Jon Doe D-Suite Finance Orders
jndo Jon Doe D-Suite E-Commerce Orders
tweb Tim Webb SME Analytics Fraud
kpow Kat Power C-Suite Controls Architecture
lguan Lin Guan D-Suite Sales Marketing
phar Pat Harms Manager HR HR

This is what I'm working with:

SELECT
    U.alias
    U.name
    U.role
    U.oversight
    U.functional_are
FROM employee U
LEFT OUTER JOIN 
   (SELECT 
       c.alias, c.oversight
   FROM employee c
   WHERE role = 'C-Suite' AND oversight <> 'N/A'
   GROUP BY oversight, alias)
ON U.alias = c.alias 

And I'm getting error messages on the join when I try and just self-join c-suite. I also get an error when I try and put c.alias as c_suite in the select statement and not do a subquery. In R this is done with gather and spread, but I'm getting confused with Hive syntax.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • A subquery needs an alias. (Excepting certain rogue DBMSs.) c is not known outside the subquery. (What do you think it represents & how/why?--Rhetorical.) PS Turn on all errors & warnings & logs & chop code down until you don't get the error then keep adding minimal code back until you get the error. Try to chop out all things that don't seem related to the error. Read the manual & introductions re the grammar/syntax. Understand what each subexpression value is & what names denote. PS If the syntax "confuses" you why aren't you researching/asking about that before writing code?--Rhetorical. – philipxy Aug 03 '22 at 20:30
  • Debug questions require a [mre]--cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Aug 03 '22 at 20:33
  • For clarity: Make table aliases very short & mnemonic, define one for every JOIN base table & subquery, & dot every column name with its alias. [When to use SQL Table Alias](https://stackoverflow.com/q/198196/3404097) etc. With no JOIN you must give still give an alias after a subquery but otherwise there's no need to give or dot with an alias. – philipxy Aug 03 '22 at 21:40
  • Try - `SELECT U.alias U.name U.role U.oversight U.functional_are FROM employee U LEFT OUTER JOIN (SELECT c.alias, c.oversight FROM employee c WHERE role = 'C-Suite' AND oversight <> 'N/A' GROUP BY oversight, alias) c ON U.alias = c.alias ` – Koushik Roy Aug 04 '22 at 04:37
  • Thank you, Koushik! That works. I also found a way using CASE. – newuser00989 Aug 04 '22 at 15:18

1 Answers1

0

For anyone else coming across this question, I also have a piece of proprietary software in addition to Hive that I'm using to make extracts and it cherrypicks SQL syntax. Some functions work, some don't. CASE worked for me. Not the cleanest query, but it works like pivot. I'm able to make joins after from statement.

SELECT 
CASE WHEN U.role = 'C-Suite' THEN U.alias 
ELSE 'NULL' END AS "C_Alias",
CASE WHEN U.role = 'D-Suite' THEN U.alias
ELSE 'NULL' END AS "D_Alias",
CASE WHEN U.role = 'SME' THEN U.alias
ELSE 'NULL' END AS "SME_Alias",
CASE WHEN U.role = 'Manager' THEN U.alias
ELSE 'NULL' END AS "Manager_Alias",
U.role
,U.name
,U.oversight
,U.functional_area
FROM employees U