2

I have one table in hive table1. I am using impala to fetch data from table

table1
------
name, amount

where values of table are

test1, 10
test1, 15
test1, 30

test2, 30
test2, 40
test2, 50

test3, 30
test3, 40
test3, 50

Now I have to fetch data from table1 such that, it fetch data for all name (test1, test2, test3) but gives only top 2 records based on amount for each name.

Can it possible in IMPALa or even in MYSQL?

Thanks in advance

Poonam Bhatt
  • 10,154
  • 16
  • 53
  • 72

3 Answers3

1

If you're using Impala 2.0 or greater, you can use analytic functions to accomplish this task:

SELECT name, amount
FROM (SELECT name, amount, row_number() OVER (PARTITION BY name ORDER BY amount DESC) AS pos
      FROM table1) t
WHERE pos < 3;

If you must use MySQL, it appears that you can fake window functions using user-defined variables, as demonstrated in another question on StackOverflow.

Community
  • 1
  • 1
Jeff Hammerbacher
  • 4,226
  • 2
  • 29
  • 36
1

You can use regexp_extract to get the limits of concatenated amounts. {1,1} will give you the top 2. {1,49} will give you the top 50, etc.

For example (top 2):

>select name, regexp_extract(group_concat(cast(amount as string),','), '^(([0-9]+,)<b>{1,1}</b>[0-9]+|[0-9]+)',0) as top_two_amount from (select name, amount from table1 order by name, amount desc) t group by name;
Mihai Maruseac
  • 20,967
  • 7
  • 57
  • 109
0

I had a difficult time understanding your question, but if I understand you correctly, you're trying to get a listing of all the amounts by name. If that's the case, in MySQL, this is pretty simple:

  SELECT name, GROUP_CONCAT( amount ) AS amounts FROM table1 GROUP BY name ;

This will give you a row for each name with all the amounts associated with that name.

kbcmdba
  • 41
  • 3