Rather than using the row_start
and row_stop
arguments in Table.scan(), this may be a better fit for the filter
argument with a regular expression.
See the API reference for details on the filter argument:
The keyword argument filter
is also supported (beyond column and row
range filters supported here). HappyBase / HBase users will have used
this as an HBase filter string. (See the Thrift docs for more details
on those filters.) However, Google Cloud Bigtable doesn’t support
those filter strings so a RowFilter
should be used instead.
RowFilter is a type provided by Google's Bigtable library. Here are the docs. Assuming that the ID field you're referring to is your row key, we can use RowKeyRegexFilter to filter the IDs by the pattern you've described.
We'll start by coming up with a regular expression to match a list of IDs for the desired months. For example, if you wanted to filter year-based IDs for the months of December and January, you could use this (note that you must go from the largest number to the shortest) -- see this link to test the regular expression:
\d\d\d\d\/(12|1)
Here's an attempt to write a function that creates a Google Bigtable HappyBase scan call with an appropriate filter, where table
is a HappyBase table and months
is a list of integers. Please note that I have not tested this code, but hopefully it at least gives you a starting point.
from google.cloud.bigtable.row_filters import RowKeyRegexFilter
def filter_by_months(table, months):
months_reversed = sorted(months, reverse=True)
months_strings = [str(month) for month in months_reversed]
months_joined = "|".join(months_strings)
key_filter = RowKeyRegexFilter('\d\d\d\d\/({})'.format(months_joined))
return table.scan(filter=key_filter)