1

On mysql I would enter the following query, but running the same on google BigQuery throws an error for the upper limit. How do I specify limits on a query? Say I have a query that returns 20 results and I want results between 5 and 10 only, how should I frame the query on Google BigQuery?)

For example:

SELECT id,
       COUNT(total) AS total 
FROM ABC.data 
GROUP BY id 
ORDER BY count DESC 
      LIMIT 5,10;

If I only put "LIMIT 5" on the end of the query, I get the top 5 and if I put "LIMIT 10" I ge t the top 10, but what syntax do I use to get between 5 and 10.

Could someone please shed some light on this?

Any help is much appreciated.

Thanks and have a great day.

user2989892
  • 63
  • 2
  • 8
  • 1
    What RDBMW do you use? – Krzysztof Nov 18 '13 at 09:04
  • In Rails I use mysql so I would use the above, but running the same for on the bigquery ui results in an error. – user2989892 Nov 18 '13 at 09:48
  • could you not add a rank variable using the rank function and the filter the result using a where clause? I never used MySql but that is how I would do it in Postgres, here is a link to the rank function in MySQL: http://stackoverflow.com/questions/3333665/mysql-rank-function – hirolau Nov 18 '13 at 11:27

3 Answers3

4

I would use window functions... something like

select * from 
(Select id, total, row_number() over (order by total desc) as rnb
from
(SELECT id,
       COUNT(total) AS total 
FROM ABC.data 
GROUP BY id 
))
where rnb>=5 and rnb<=10
N.N.
  • 3,094
  • 21
  • 41
1

The windowing function answer is a good one, but I thought I'd give another option that involves how your result is fetched rather than how the query is run.

If you only need the first N rows you can add a LIMIT N to your query. But if you don't need the first M rows, you can change how you fetch the results. If you're using the the java API, you can use the setStartIndex() method on either the TableData.list() or the Jobs.getQueryResults() call to only fetch rows starting from a particular index.

Jordan Tigani
  • 26,089
  • 4
  • 60
  • 63
0

That question makes no sense to an ever changing dataset. if you have a 1 second delay between when you ask for the first 5 and the next 5... the data could have changed. It's order is now different and you will miss data or get duplicate results. So databases like BigTable have a method for doing one query of the data and giving you the resultset to you in groups. If that were the case: What you are looking for is called query cursors. I can't say this any better than their own example so [Here is the documentation on them.][1]

But since you said the data does not change then fetch() will work just fine. fetch() has 2 options you will want to take note of limit and offset. 'limit' is the maximum number of results to return. If set to None, all available results will be retrieved. 'offset' is how many results to skip. Check out other options here: https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_fetch

Back2Basics
  • 7,406
  • 2
  • 32
  • 45
  • I'm not following you. In my case, the data doesn't change once its in google bigquery. Once it is in I would like to run several queries, some with limits. For example I have something that returns 20 rows, and I want only rows 5-10 or similar. How is this acheived on Bigquery? Can it be achieved on Bigquery? – user2989892 Nov 18 '13 at 09:51
  • I think we are talking about two different things here - I am referring to Google Bigquery and you are referring to Appengine/Datastore. The Bigquery docs only for limit only specify a number, not a range which is what I am trying to obtain. Thanks for looking at the question, all the same. – user2989892 Nov 18 '13 at 10:47
  • Appengine runs on Google BigTable (google's NoSql DB). I thought bigquery was another name for GQL which is available on BigTable. Sorry for the mixup. – Back2Basics Nov 18 '13 at 18:26