27

Would it be possible to run a python function in BigQuery?

It seems like C can be compiled to WebAssembly and run in BQ, per this blog post from Felipe.

And of course Python can be compiled to C or C++ using cython or some other tools (or it could even be transpiled to javascript). So then my question is does anyone have any experience executing a python function in BigQuery. If so, what is the flow that you're using to do it?

Possible options here are:

  • "Transform" the python into javascript to run.
  • Compile the python into c or cpp and compile that into wasm

Here is an example input to work with:

(1) Source

id         product 
1          box     
2          bottle

(2) Python functions to use

def double_id(row):
    return row['id'] * 2

def product_code(row):
    # B3
    return row['product'].upper()[0] + str(len(row['product']))

(3) Expected output

id     product      double_id       product_code
1      box          2               B3
2      bottle       4               B6

I'm not just looking to re-write the above using javascript (which would probably be the easiest way to do this), but I'm looking for a more generalized solution, if there is one that exists -- where I can take a python (standard library) function and use it in a BigQuery query.

KT12
  • 549
  • 11
  • 24
David542
  • 104,438
  • 178
  • 489
  • 842
  • 2
    I have no experience of doing this. However based on current BigQuery ability, it may be possible to either run a compiled-into-javascript-python-function or run python script from a javascript-based Python interpreter. Here's [link](https://stackoverflow.com/questions/30155551/python-in-browser-how-to-choose-between-brython-pypy-js-skulpt-and-transcrypt) and [link](https://www.google.com/search?q=python+interpreter+in+javascript&oq=python+interpreter+in+java&aqs=chrome.0.0j69i57j0l3.6943j0j7&sourceid=chrome&ie=UTF-8). – Yun Zhang Apr 01 '19 at 21:52
  • 5
    You can query BigQuery using the Python API, then you apply any custom code operations you like: https://cloud.google.com/bigquery/docs/reference/libraries - is this what you are looking for? I also use Cloud Functions to perform BigQuery ETL jobs. – Ben P Apr 02 '19 at 09:55
  • I'm very curious - what would be the added benefit of python over JS? – Robert Lacok Apr 02 '19 at 16:21
  • @RobertLacok it's a design/UI decision -- technically speaking, yes JS would be preferable. – David542 Apr 02 '19 at 19:51
  • Do you want a bigquery ORM? – xilpex Apr 03 '19 at 20:54
  • @Xilpex -- no -- – David542 Apr 03 '19 at 20:58
  • 3
    Depending on the scope of what you are ultimately trying to accomplish, there is a [BigQuery Storage API integration with pandas](https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas) that allows you to download query results to a DataFrame. This may add an extra step that you are hoping to avoid but ultimately gets you into a position to run python functions. – benvc Apr 08 '19 at 22:22
  • I tried shedskin + compiling with emscripten. The resulting wasm binary didn't validate using `wasm-validate`. I'd have to say this is impractical and it would be better to craft the functionality in c/c++ or javascript – stacksonstacks Apr 11 '19 at 00:28
  • If your data is bigger than your memory, you can use Cloud DataFlow and writing Beam code to read from big query, applied some transformation using python functions, and write down to big query. Look at this post: “How to run Python code on your BigQuery table” by Lak Lakshmanan https://link.medium.com/IZAsdpXDvX – Netanel Malka Jun 14 '19 at 09:13

1 Answers1

1

Maybe i'm wrong but why not trying google-cloud-bigquery (you can install it with pip)?

You then call it with:

from google.cloud import bigquery

client = bigquery.Client()

and send your method calling bigquery.QueryJobConfig.

Memristor
  • 599
  • 4
  • 11