I know from this question that one can do random sampling RAND
.
SELECT * FROM [table] WHERE RAND() < percentage
But this would require a full table scan and incur equivalent cost. I'm wondering if there are more efficient ways?
I'm experimenting with tabledata.list
API but got java.net.SocketTimeoutException: Read timed out
when index
is very large (i.e. > 10000000). Is this operation not O(1)?
bigquery
.tabledata()
.list(tableRef.getProjectId, tableRef.getDatasetId, tableRef.getTableId)
.setStartIndex(index)
.setMaxResults(1L)
.execute()