1

I currently have multiple datasets within a project and I would like to know if there's a way to know at one go what are the tables that are not updated in the last 90 days which means it falls under the long term storage bucket in bigquery. Is there some sql syntax for this?

Thank you

Justine
  • 105
  • 1
  • 17
  • Does this answer your question? [Get the Last Modified date for all BigQuery tables in a BigQuery Project](https://stackoverflow.com/questions/44288261/get-the-last-modified-date-for-all-bigquery-tables-in-a-bigquery-project) – Pentium10 Oct 31 '19 at 14:30
  • thanks, I'm actually looking for more elaborate results, since I have for example 50 datasets with 20 tables each, I would have to check each dataset by 1 by 1. If there will be any way to check all tables in all datasets at once to know their size and when it was last modified – Justine Nov 01 '19 at 13:01

2 Answers2

2

Not SQL but a possible solution is using the bqtool. Describing a table will return the number of Long Term Storage bytes (numLongTermBytes):

$ bq show --format=prettyjson dataset.table
...
  "kind": "bigquery#table", 
  "lastModifiedTime": "1534845362446", 
  "location": "US", 
  "numBytes": "56", 
  "numLongTermBytes": "56", 
  "numRows": "3", 
... 

Therefore, you can extend it to list all tables in each dataset and get the desired information from each. Based from this I did a quick example:

#!/bin/bash

project=PROJECT_ID
dataset=DATASET_NAME
max_results=100

# get list of tables
tables=$(bq ls --max_results $max_results "$project:$dataset" | awk '{print $1}' | tail -n +3)

# get LTS bytes for each table
for table in $tables
do
    printf '%-35s %-50s\n' "$table" "$(bq show --format prettyjson $project:$dataset.$table | grep numLongTermBytes)"
done

and you'll get an output similar to:

Dfp                                   "numLongTermBytes": "0",                        
SO_55506947                           "numLongTermBytes": "144",                      
SO_55506947_bis                       "numLongTermBytes": "144",                      
a                                     "numLongTermBytes": "7",                        
a1                                    "numLongTermBytes": "399",                      
aaa                                   "numLongTermBytes": "8",                        
adaptive                              "numLongTermBytes": "1085",                     
adaptive_view                         "numLongTermBytes": "0",                        
audience_segment_map_exp_test         "numLongTermBytes": "300",                      
b                                     "numLongTermBytes": "7", 
... 
Guillem Xercavins
  • 6,938
  • 1
  • 16
  • 35
0

I had the same question, but needed it across all the datasets in a given project. I took @Guillem Xercavins' answer and wrapped it in another for loop and made the project a command line input.

#!/bin/bash

project=$1
max_results=1000


# Get the datasets

datasets=$(bq ls --datasets=true --project_id=$project | awk '{print $1}' | tail -n +3)


for dataset in $datasets
do
   # get list of tables
   # The tail -n +3 strips off the header lines
   tables=$(bq ls --max_results $max_results "$project:$dataset" | awk '{print $1}' | tail -n +3)

   # get LTS bytes for each table
   for table in $tables
   do
       printf '%-35s %-50s\n' "$table" "$(bq show --format prettyjson $project:$dataset.$table | grep numLongTermBytes)"
   done
done
nomadic_squirrel
  • 614
  • 7
  • 21