1

For example i have:

1: 6c71d997ba39
2: 6c71d997d269

I need to get 4.

Andru
  • 260
  • 3
  • 16

3 Answers3

2

You can consider using Levenshtein distance for your use-case

the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other

Below example is for BigQuery Standard SQL

#standardSQL
CREATE TEMPORARY FUNCTION EDIT_DISTANCE(string1 STRING, string2 STRING)
RETURNS INT64
LANGUAGE js AS """
  var _extend = function(dst) {
    var sources = Array.prototype.slice.call(arguments, 1);
    for (var i=0; i<sources.length; ++i) {
      var src = sources[i];
      for (var p in src) {
        if (src.hasOwnProperty(p)) dst[p] = src[p];
      }
    }
    return dst;
  };

  var Levenshtein = {
    /**
     * Calculate levenshtein distance of the two strings.
     *
     * @param str1 String the first string.
     * @param str2 String the second string.
     * @return Integer the levenshtein distance (0 and above).
     */
    get: function(str1, str2) {
      // base cases
      if (str1 === str2) return 0;
      if (str1.length === 0) return str2.length;
      if (str2.length === 0) return str1.length;

      // two rows
      var prevRow  = new Array(str2.length + 1),
          curCol, nextCol, i, j, tmp;

      // initialise previous row
      for (i=0; i<prevRow.length; ++i) {
        prevRow[i] = i;
      }

      // calculate current row distance from previous row
      for (i=0; i<str1.length; ++i) {
        nextCol = i + 1;

        for (j=0; j<str2.length; ++j) {
          curCol = nextCol;

          // substution
          nextCol = prevRow[j] + ( (str1.charAt(i) === str2.charAt(j)) ? 0 : 1 );
          // insertion
          tmp = curCol + 1;
          if (nextCol > tmp) {
            nextCol = tmp;
          }
          // deletion
          tmp = prevRow[j + 1] + 1;
          if (nextCol > tmp) {
            nextCol = tmp;
          }

          // copy current col value into previous (in preparation for next iteration)
          prevRow[j] = curCol;
        }

        // copy last col value into previous (in preparation for next iteration)
        prevRow[j] = nextCol;
      }

      return nextCol;
    }

  };

  var the_string1;

  try {
    the_string1 = decodeURI(string1).toLowerCase();
  } catch (ex) {
    the_string1 = string1.toLowerCase();
  }

  try {
    the_string2 = decodeURI(string2).toLowerCase();
  } catch (ex) {
    the_string2 = string2.toLowerCase();
  }

  return Levenshtein.get(the_string1, the_string2) 

""";   

WITH strings AS (
  SELECT '1: 6c71d997ba39' string1, '2: 6c71d997d269' string2
)
SELECT string1, string2, EDIT_DISTANCE(string1, string2) changes
FROM   strings

with result

Row     string1             string2             changes  
1       1: 6c71d997ba39     2: 6c71d997d269     4    
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0
SELECT
  (SELECT COUNTIF(c != s2[OFFSET(off)])
   FROM UNNEST(SPLIT(s1, '')) AS c WITH OFFSET off) AS count
FROM dataset.table
Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99
  • While this code snippet may solve the problem, it doesn't explain why or how it answers the question. Please [include an explanation for your code](//meta.stackexchange.com/q/114762/269535), as that really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Luca Kiebel Sep 09 '18 at 08:45
0

Source: https://stackoverflow.com/a/57499387/11059644

Ready to use shared UDFs - Levenshtein distance:

SELECT fhoffa.x.levenshtein('felipe', 'hoffa'), fhoffa.x.levenshtein('googgle', 'goggles'), fhoffa.x.levenshtein('is this the', 'Is This The')