0

I have a table called names, and I want to select 2 names after being count(*) as uniq, and then another 2 names just from the entire sample pool.

firstname
John
John
Jessica
Mary
Jessica
John
David
Walter

So the first 2 names would select from a pool of John, Jessica, and Mary etc giving them equal chances of being selected, while the second 2 names will select from the entire pool, so obvious bias will be given to John and Jessica with multiple rows.

I'm sure there's a way to do this but I just can't figure it out. I want to do something like

SELECT uniq.firstname 
FROM (SELECT firstname, count(*) as count from names GROUP BY firstname) uniq
limit 2
AND
SELECT firstname
FROM (SELECT firstname from names) limit 2

Is this possible? Appreciate any pointers.

user1899415
  • 3,015
  • 7
  • 22
  • 31
  • So... You want to select twice, but I'm foggy on your criteria. The first select you want to just select two random names? The second you want to select the two names that appear the most? What exactly is the criteria for selection? – JNevill Sep 15 '14 at 20:28
  • Yes. The actual database is actually much bigger, I just simplified for illustration purposes. First select gives all likelihood to the names, while second select is biased toward the most representative. – user1899415 Sep 15 '14 at 20:33
  • Possible duplicate of [How to randomly select rows in SQL?](http://stackoverflow.com/questions/580639/how-to-randomly-select-rows-in-sql) – Ram Sep 15 '14 at 20:38

2 Answers2

0

I think you are close but you need some randomness for the sampling:

(SELECT uniq.firstname 
 FROM (SELECT firstname, count(*) as count from names GROUP BY firstname) uniq
 ORDER BY rand()
 limit 2
)
UNION ALL
(SELECT firstname
 FROM from names
 ORDER BY rand()
 limit 2
)
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

As mentioned here you can use RAND or similar functions to achieve it depending on the database.

MySQL:

SELECT firstname 
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname) 
ORDER BY RAND()
LIMIT 2

PostgreSQL:

SELECT firstname 
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname)  
ORDER BY RANDOM()
LIMIT 2

Microsoft SQL Server:

SELECT TOP 2 firstname 
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname) 
ORDER BY NEWID()

IBM DB2:

SELECT firstname , RAND() as IDX 
FROM (SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname) 
ORDER BY IDX FETCH FIRST 2 ROWS ONLY

Oracle:

SELECT firstname 
FROM(SELECT firstname, COUNT(*) as count FROM names GROUP BY firstname ORDER BY dbms_random.value )
WHERE rownum in (1,2)

Follow the similar approach for selecting from entire pool

Community
  • 1
  • 1
Ram
  • 3,092
  • 10
  • 40
  • 56