0

Hi I have a postgresql "TABLE1" with 2.7 million records. Each record has a field "FIELD1" that may be empty or may have data. I want a SELECT statement or method that a) returns the first 1000 results from TABLE1 with FIELD1 empty, and b) randomly pick one of the records to return to a python variable. Help???

user2104778
  • 992
  • 1
  • 14
  • 38
  • 2
    This question might be related: http://stackoverflow.com/questions/8674718/best-way-to-select-random-rows-postgresql – svk Jul 02 '13 at 14:40
  • You need to define: "empty", "first", your requirements (speed? repeated use?). Please supply a table definition. – Erwin Brandstetter Jul 02 '13 at 16:32

2 Answers2

2

For selecting first 1000 result you can use limit in your query

SELECT field1 FROM table1 ORDER BY id Limit 1000;

The result will be a list in python. So you can use python random module to operate on the result list.

Ansuman Bebarta
  • 7,011
  • 6
  • 27
  • 44
  • 1
    How does postgresql choose the 1000? Is it in order of primary key number? If yes, how can I get 1000 random records? – user2104778 Jul 02 '13 at 14:22
  • You can add ORDER BY clause to query also. I have edited my answer. Default order by has to be primary id but not always guaranted to be ordered. – Ansuman Bebarta Jul 02 '13 at 15:01
1

If performance is not a concern:

SELECT *
FROM  (
   SELECT *
   FROM   tbl
   WHERE  field1 IS NULL
   ORDER  BY id           --?? unclear from question
   LIMIT  1000
   ) sub
ORDER  BY random()
LIMIT  1;

This returns 1 perfectly random row from the "first" 1000 empty rows.
"Empty" meaning NULL, and "first" meaning smallest id.

If performance is a concern, you need to be a lot more specific.
If your circumstances match, this related answer might of help:
Best way to select random rows PostgreSQL

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228