0

I am trying to run the below query :

select [every_column],count(*) from <table> group by [every_column] having count(*) >1

But column names should be derived in the same query. I believe show columns in would list down the column names separated by new line. But I need to use it in one query to retrieve the result. Appreciate any help in this regard.

keeplearning
  • 369
  • 2
  • 6
  • 17

2 Answers2

1

You can use shell sed to search the newlines(\n) and replace with comma(,).

  • Assign the comma separated column names to a hive variable, use the variable name in your hive query.

References for sed and set hive variables

notNull
  • 30,258
  • 4
  • 35
  • 50
0

Have you thought of using subqueries or even CTE? Maybe this helps you find your answer:

select outer.col1,
       outer.col2,
       outer.col3,
       outer.col4,
       outer.col5,
       outer.col6, count(*) as cnt
from (
        select <some logic> as col1,
               <some logic> as col2,
               <some logic> as col3,
               <some logic> as col4,
               <some logic> as col5,
               <some logic> as col6
        from innerTable
)outer
group by outer.col1,
       outer.col2,
       outer.col3,
       outer.col4,
       outer.col5,
       outer.col6
Raunak Jhawar
  • 1,541
  • 1
  • 12
  • 21
  • Hi, thanks! But I am looking to get the column names in the subquery and then use the column names in the outer query – keeplearning Mar 13 '19 at 20:31