0

We are trying to use these in us-west2 - https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs/community.

this first query processes just fine, in US enter image description here

this second query wont run enter image description here

Our dataset models is in us West 2. It seems all queries from the 2nd query editor are then processed in us-west 2 where, it seems bqutil does not exist? How can we find the function bqutil.fn.levenshtein when processing in us-west2 (where our datasets all exist)?

Canovice
  • 9,012
  • 22
  • 93
  • 211
  • check out https://stackoverflow.com/a/52263170/5221944 for levenshtein UDF implementation. also just search for "levenshtein " here on SO to see many more options – Mikhail Berlyant Feb 01 '22 at 05:22

1 Answers1

1

To use the levenshtein UDF in your BigQuery table, you need to create a UDF in the location where your dataset resides.

You can refer to the below UDF and the screenshot where the data resides in us-west2 location.

UDF :

CREATE OR REPLACE FUNCTION
`stackdemo.fn_LevenshteinDistance`(in_a STRING, in_b STRING) RETURNS INT64 LANGUAGE js AS R"""

var a = in_a.toLowerCase();
var b = in_b.toLowerCase();
 if(a.length == 0) return b.length;
if(b.length == 0) return a.length;
var matrix = [];
// increment along the first column of each row
var i;
for(i = 0; i <= b.length; i++){
 matrix[i] = [i];
}
// increment each column in the first row
var j;
for(j = 0; j <= a.length; j++){
 matrix[0][j] = j;
}
// Fill in the rest of the matrix
for(i = 1; i <= b.length; i++){
 for(j = 1; j <= a.length; j++){
   if(b.charAt(i-1) == a.charAt(j-1)){
     matrix[i][j] = matrix[i-1][j-1];
   } else {
     matrix[i][j] =
       Math.min(matrix[i-1][j-1] + 1, // substitution
       Math.min(matrix[i][j-1] + 1, // insertion
       matrix[i-1][j] + 1)); // deletion
   }
 }
}
return matrix[b.length][a.length];
""";

Query :

SELECT
 source,
 target,
 `stackdemo.fn_LevenshteinDistance`(source, target) distance,
FROM UNNEST([
 STRUCT('analyze' AS source, 'analyse' AS target),
 STRUCT('opossum', 'possum'),
 STRUCT('potatoe', 'potatoe'),
 STRUCT('while', 'whilst'),
 STRUCT('aluminum', 'alumininium'),
 STRUCT('Connecticut', 'CT')
]);

Output :

enter image description here

Sandeep Mohanty
  • 1,419
  • 5
  • 13