8

I'm looking into the possibility to use BigQuery and its API to do on-site queries depending on content that is viewed by our visitors. Therefore the response time is crucial. I have loaded a very simple structured dataset of 10k rows (4columns) and run a very simple query and that takes between 1 and 2 seconds. My questions is hopefully pretty simple to answer, will I ever be able to get a <1sec response time with the BQ API by optimising the data in someway or not?

Thanks a lot in advance!

Regards, Arjen

Arjen
  • 133
  • 2
  • 10

3 Answers3

14

BigQuery is not an online transaction processing (OLTP) database but an interactive analysis database making possible to scan terabytes of data within seconds. While the query time is pretty consistent, since it is a shared service, the query time is not guaranteed, i.e. query running for 2 seconds might run 1.5 seconds or 3 seconds at different periods of time. Due to the nature and internals of BigQuery, query time of < 1s is not realistic as of today.

One of the popular design patterns is to let BigQuery do the heavy lifting of complex analysis of your data and then storing results in OLTP (like mySQL) or even in-memory (like Redis) database and serve the results to clients from there. You can periodically update the data by running the queries in the background.

DoiT International
  • 2,405
  • 1
  • 20
  • 24
  • Thanks for this answer! So would using e.g. Google Cloud SQL combined with Google App Engine seem like a more suited solution to you? – Arjen Mar 25 '16 at 22:30
  • @Arjen - that would depend on the volume of your data. For 10k rows with 4 columns, I would definitely go with CloudSQL Gen1 or Gen2 and use AppEngine to serve the results. Just make sure you have the right indexes so your query will be served from index and not full scan. – DoiT International Mar 25 '16 at 22:33
  • that was just for testing purposes. Say it would be 1000k rows with 25 columns, would CloudSql still be a logical choice? – Arjen Mar 25 '16 at 22:42
  • @Arjen - I think so. As long as you know up-front what queries you're going to run and build indexes accordingly, you will be ok with CloudSQL. – DoiT International Mar 25 '16 at 22:46
  • @Arjen - if you're satisfied with my answer, please mark it as "accepted". This is [why it's important](http://stackoverflow.com/help/accepted-answer). Thank you! – DoiT International Mar 26 '16 at 06:57
5

BigQuery is designed for interactive analysis of MASSIVELY large datasets.
Processing time is expected within the seconds.
At the same time, expecting that running query against smaller dataset will bring the runtime under the second – is not realistic just because it does not design this way.
No matter what data size you have - you still “pay” all expenses as if you would process big dataset.

So, short answer unfortunatelly:
Today – Not! Tomorrow – Not! Day after – you never know, but I don’t think so, unless BigQuery will evolve into something else

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

Although the others have already stated that the answer is "no" it could be interesting for you to have a look at the BigQuery Storage API. It lets you run simple SELECT-WHERE-queries on regular tables directly against cloud storage (which is where BigQuery's data is physically stored). It should be faster than the regular BigQuery API but will probably not respond within 1s reliably (haven't tried it myself yet though, but will update my answer once have).

Backlin
  • 14,612
  • 2
  • 49
  • 81