1

I'm working on a project where we need to display BigQuery results in a table within a web application.

We’ve built the feature by paging, sorting and searching directly in BigQuery, but the performance isn’t what you would expect of a modern web application. It takes several seconds the apply a search term or change a page.

I can't really share much code, but this a general question that applies any large resultset generated in BigQuery.

For little bit of context. We create a view in BigQuery by joining a product catalog to orders.

WITH Catalog AS
(
  SELECT 
    productId,
  FROM `CatalogTable`
),
Orders AS (
  SELECT 
    p.productId,
    SUM(p.qty) AS qty 
  FROM `OrdersView` as o, o.products AS p 
  GROUP BY p.productId
)
SELECT
  c.productId,
  IF(o.qty IS NULL, 0, o.qty) AS qty,
  ROW_NUMBER() OVER(ORDER BY qty DESC) as salesRank 
FROM Catalog AS c
LEFT JOIN
 Orders AS o
 ON CONCAT(c.name, c.sku) = CONCAT(o.name, o.sku)

And the view is queried like so:

SELECT ...
FROM   `catalog` c
LEFT JOIN `catalogView` cv
WHERE  c.name LIKE '%searchTerm%'
LIMIT  10
OFFSET 0

What are the options for making this grid-view perform as it would if it were built on a traditional SQL database (or close to the performance)?

I've considered clustering, but i don't believe this is an option since i'm not partitioning the table:

https://medium.com/google-cloud/bigquery-optimized-cluster-your-tables-65e2f684594b

NOTES:

It's acceptable for the results to be a little delayed, if streaming the results into another database is an option.

The query is called via a WebApi endpoint and displayed in an Angular grid-view.

New orders are imported every 15 minutes so the results from this query won't be entirely static, they can change periodically.

The data-grid must support paging, sorting and searching, and the grid could contain 10,000 plus results.

chrism233
  • 97
  • 10
  • Can you specify better how is your data, what kind of query you perform and how is your architecture as well?? – rmesteves Jan 14 '20 at 09:33
  • I agree with rmesteves, we need more context about your issue, could you please send which queries and dummy/example code have you tried so far. – Pol Ortiz Jan 14 '20 at 16:03
  • Thanks, i've added a little more detail, but i think this answers my question: https://stackoverflow.com/questions/36228532/can-bigquery-be-fast-enough-for-real-time-onsite-request Maybe there are other options though? – chrism233 Jan 14 '20 at 22:44
  • In fact BigQuery should not be used if you expect an OLTP behavior. If you want to use GCP i'd suggest you to take a look at Cloud SQL and Cloud Spanner as you can see here https://cloud.google.com/sql/ and here https://cloud.google.com/spanner/ – rmesteves Jan 15 '20 at 07:57
  • _Maybe there are other options though?_ How computationally capable are the webclients? E.g. are the webclients mostly modern laptops or smartphones/old tablets? If it's the former then there could be a way to avoid adding another database as a caching layer . – winwiz1 Jan 16 '20 at 12:48

1 Answers1

1

BigQuery should not be used if you expect OLTP behavior or performance. In your case, if you want to keep your project on GCP and also keep your data model as similar as possible with the model you already have, I would suggest you to take a look at Cloud SQL and Cloud Spanner.

Both are fully managed Relational Databases. The main difference is that Cloud Spanner is horizontally scalable whereas Cloud SQL is not, i.e. if you need only one node, use Cloud SQL. If you need to grow up your cluster, use Cloud Spanner.

Furthermore, both of them have it's respective Web APIs. You can find the Cloud Spanner Web API reference here. For the Cloud SQL, the reference depends on which DBMS you choose: SQLServer, MySQL or PostgreSQL.

I hope it helps

rmesteves
  • 3,870
  • 7
  • 23