21

I'm trying to create a set of Unit Tests to test the Google Client Library for Bigquery. I'm struggling to make a Unittest file which will mock the client and will let me test my inputs. I've provided a simple script with some set functionality to return a list of Tables that belong to the DataSet.

Would somebody show me a sample example of mocking the Google Client Library as the documentation I have found @ https://github.com/googleapis/google-cloud-python/blob/master/bigquery/tests/unit/test_client.py is not directly interacting with the methods of the code, so I am unable to apply it to my code.

Appreciate any ideas or ways to achieve this, I can't seem to find anywhere on Stack Overflow documenting this problem.

Thanks

from google.cloud import bigquery


def get_dataset():
    client = bigquery.Client.from_service_account_json('some_client_secret.json')

    dataset_id = 'some_project.some_dataset'

    dataset = client.get_dataset(dataset_id)

    full_dataset_id = "{}.{}".format(dataset.project, dataset.dataset_id)
    friendly_name = dataset.friendly_name
    print(
        "Got dataset '{}' with friendly_name '{}'.".format(
            full_dataset_id, friendly_name
        )
    )

    # View dataset properties
    print("Description: {}".format(dataset.description))
    print("Labels:")
    labels = dataset.labels
    if labels:
        for label, value in labels.items():
            print("\t{}: {}".format(label, value))
    else:
        print("\tDataset has no labels defined.")

    # View tables in dataset
    print("Tables:")
    tables = list(client.list_tables(dataset))  # API request(s)
    if tables:
        for table in tables:
            print("\t{}".format(table.table_id))
    else:
        print("\tThis dataset does not contain any tables.")
Py.Jordan
  • 415
  • 1
  • 4
  • 16

2 Answers2

36

It took a fair amount of Googling, and trial and error, to figure out how to do this, and I just got it working, so I thought it was worth sharing.

unittest provides patch which allows you to mock a function at the point of use, ie. replace a Google API call in your code under test, and mock, which allows you to further customise the result of accessing attributes and calling functions on that mock.

The unittest docs explaining patching here: https://docs.python.org/3/library/unittest.mock.html#where-to-patch

This does explain how it works, but the best explanation I found in order to understand how to do this properly is: http://alexmarandon.com/articles/python_mock_gotchas/

Here is a Python script to be tested, mocking_google.py, containing references to Google Storage and BigQuery APIs:

from google.cloud.bigquery import Client as bigqueryClient
from google.cloud.storage import Client as storageClient

def list_blobs():

    storage_client = storageClient(project='test')

    blobs = storage_client.list_blobs('bucket', prefix='prefix')

    return blobs

def extract_table():

    bigquery_client = bigqueryClient(project='test')

    job = bigquery_client.extract_table('project.dataset.table_id', destination_uris='uri')

    return job

Here is the unit test:

import pytest
from unittest.mock import Mock, patch

from src.data.mocking_google import list_blobs, extract_table

@pytest.fixture
def extract_result():
    'Mock extract_job result with properties needed'
    er = Mock()
    er.return_value = 1
    return er

@pytest.fixture
def extract_job(extract_result):
    'Mock extract_job with properties needed'
    ej = Mock()
    ej.job_id = 1
    ej.result.return_value = 2
    return ej

@patch("src.data.mocking_google.storageClient")
def test_list_blobs(storageClient):

    storageClient().list_blobs.return_value = [1,2]

    blob_list = list_blobs()

    storageClient().list_blobs.assert_called_with('bucket', prefix='prefix')
    assert blob_list == [1,2]

@patch("src.data.mocking_google.bigqueryClient")
def test_extract_table(bigqueryClient,extract_job):

    bigqueryClient().extract_table.return_value = extract_job

    job = extract_table()

    bigqueryClient().extract_table.assert_called_with('project.dataset.table_id', destination_uris='uri')
    assert job.job_id == 1
    assert job.result() == 2

Here is the test results:

pytest -v src/tests/data/test_mocking_google.py============================================================ test session starts =============================================================
platform darwin -- Python 3.7.6, pytest-5.3.5, py-1.8.1, pluggy-0.13.1 -- /Users/gaya/.local/share/virtualenvs/autoencoder-recommendation-copy-zpYZ6J1x/bin/python3
cachedir: .pytest_cache
rootdir: /Users/gaya/Documents/GitHub/mlops-autoencoder-recommendation, inifile: tox.ini
plugins: cov-2.8.1
collected 2 items                                                                                                                            

src/tests/data/test_mocking_google.py::test_list_blobs PASSED                                                                          [ 50%]
src/tests/data/test_mocking_google.py::test_extract_table PASSED                                                                       [100%]

============================================================= 2 passed in 1.14s ==============================================================

Happy to explain further if how this works is not clear :)

pink spikyhairman
  • 2,391
  • 1
  • 16
  • 13
  • Do you need valid credentials to run the test? – Andy Carlson May 08 '20 at 14:43
  • as long as you're mocking the Google API client, like ```storageClient``` or ```bigqueryClient``` in my example, then no, because it never tries to open a connection to Google – pink spikyhairman May 08 '20 at 18:16
  • What a great answer! Thanks for sharing jut seen this from a bump email apologies! The explaination is great thank you! – Py.Jordan May 11 '20 at 16:55
  • Is it possible to mock bigquery_client if initialized in the main scope, right after the imports? – dinigo Nov 05 '20 at 16:00
  • @dinigo yes, patching patches the code, not instances – pink spikyhairman Nov 06 '20 at 14:12
  • @pinkspikyhairman can you explain why `extract_job` needs `extract_result`? and what happens when the test runs and the fixtures kick in? – cryanbhu Dec 16 '20 at 04:27
  • 1
    @cryanbhu fixtures are created before any test runs. They can be used to mock a single function, a class with many methods, dataclasses, etc. ```extract_job``` illustrates how to mock a job 'object' which has both a ```job_id``` attribute and a ```result``` method with a ```return_value```. As you astutely noticed, I probably intended ```extract_result``` to demonstrate using a mocked object within a mocked object and return than mocked object as ```extract_job```s ```result``` return value... but didn't :) – pink spikyhairman Dec 16 '20 at 21:55
7

I also find it hard to get around the authentication part and only mock interacting with methods, so I ended up just mocked the whole library. :facepalm:

import sys

from unittest.mock import MagicMock

sys.modules["google.cloud.storage"] = MagicMock()

from your_application import make_app


def test_make_app():
    make_app()
  • Thanks for sharing your answer, it's a right pain however I have had greater success with patching and mocking parts of my code. – Py.Jordan Nov 05 '19 at 12:57