I am finding top 10 trending tweets in hive on basis of retweet_count ie The tweet which has highest retweet_count will be 1st and so-on....
Here is election Table details
id bigint from deserializer
created_at string from deserializer
source string from deserializer
favorited boolean from deserializer
retweeted_status struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int> from deserializer
entities struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> from deserializer
text string from deserializer
user struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string,location:string> from deserializer
in_reply_to_screen_name string from deserializer
my query
select text
from election
where retweeted_status.retweet_count IN
(select retweeted_status.retweet_count as zz
from election
order by zz desc
limit 10);
It is returning me the same tweet 10 times. (TWEET-ABC, TWEET-ABC, TWEET-ABC, . . . TWEET-ABC)
So what I did is break the nested query , when I run inner query
select retweeted_status.retweet_count as zz
from election
order by zz desc
limit 10
It return 10 different values (1210,1209,1208,1207,1206,....1201)
Afterwards when I run my outer query
select text
from election
where retweeted_status.retweet_count
IN (1210,1209,1208,1207,1206,....1201 );
The results are same 10 tweets (TWEET-ABC, TWEET-ABC, TWEET-ABC, . . . TWEET-ABC)
Whats wrong in my query logic ?