Hive- how to get the derive column names and use it in the same query?

Question

I am trying to run the below query :

select [every_column],count(*) from <table> group by [every_column] having count(*) >1

But column names should be derived in the same query. I believe show columns in would list down the column names separated by new line. But I need to use it in one query to retrieve the result. Appreciate any help in this regard.

score 1 · Answer 1 · answered Mar 14 '19 at 01:03

1

You can use shell sed to search the newlines(\n) and replace with comma(,).

Assign the comma separated column names to a hive variable, use the variable name in your hive query.

References for sed and set hive variables

answered Mar 14 '19 at 01:03

notNull

30,258
4
35
50

score 0 · Answer 2 · answered Mar 13 '19 at 20:26

Have you thought of using subqueries or even CTE? Maybe this helps you find your answer:

select outer.col1,
       outer.col2,
       outer.col3,
       outer.col4,
       outer.col5,
       outer.col6, count(*) as cnt
from (
        select <some logic> as col1,
               <some logic> as col2,
               <some logic> as col3,
               <some logic> as col4,
               <some logic> as col5,
               <some logic> as col6
        from innerTable
)outer
group by outer.col1,
       outer.col2,
       outer.col3,
       outer.col4,
       outer.col5,
       outer.col6

Hi, thanks! But I am looking to get the column names in the subquery and then use the column names in the outer query — keeplearning, Mar 13 '19 at 20:31

Hive- how to get the derive column names and use it in the same query?

2 Answers2