Questions tagged [hamming-distance]

The Hamming distance is a mathematical distance function for a pair of strings (sequences) that can be computed with a binary calculation. It counts the number of symbols in the string that are different. Posts that are not about implementation may belong on https://math.stackexchange.com.

For the special case of two binary strings, it may be implemented as the bitcount of their XOR:

int H(int a, int b){
    return popcount(a^b);
}
291 questions
84
votes
7 answers

Efficiently find binary strings with low Hamming distance in large set

Problem: Given a large (~100 million) list of unsigned 32-bit integers, an unsigned 32-bit integer input value, and a maximum Hamming Distance, return all list members that are within the specified Hamming Distance of the input value. Actual data…
Eric J.
  • 147,927
  • 63
  • 340
  • 553
57
votes
2 answers

Hamming Distance vs. Levenshtein Distance

For the problem I'm working on, finding distances between two sequences to determine their similarity, sequence order is very important. However, the sequences that I have are not all the same length, so I pad any deficient strings with empty points…
don
  • 820
  • 1
  • 6
  • 10
43
votes
7 answers

Similar image search by pHash distance in Elasticsearch

Similar image search problem Millions of images pHash'ed and stored in Elasticsearch. Format is "11001101...11" (length 64), but can be changed (better not). Given subject image's hash "100111..10" we want to find all similar image hashes in…
TautrimasPajarskas
  • 2,686
  • 1
  • 32
  • 40
27
votes
2 answers

Hamming distance on binary strings in SQL

I have a table in my DB where I store SHA256 hashes in a BINARY(32) column. I'm looking for a way to compute the Hamming distance of the entries in the column to a supplied value, i.e. something like: SELECT * FROM table ORDER BY…
CAFxX
  • 28,060
  • 6
  • 41
  • 66
23
votes
10 answers

Shortest path to transform one word into another

For a Data Structures project, I must find the shortest path between two words (like "cat" and "dog"), changing only one letter at a time. We are given a Scrabble word list to use in finding our path. For example: cat -> bat -> bet -> bot -> bog ->…
dacman
  • 609
  • 1
  • 8
  • 17
22
votes
6 answers

Inverse of Hamming Distance

*This is a brief introduction, the specific question is in bold at the last paragraph. I'm trying to generate all strings with a given Hamming Distance to solve efficiently a bioinformatic assignment. The idea is, given a string (ie.…
JackS
  • 423
  • 3
  • 12
19
votes
4 answers

Efficiently build a graph of words with given Hamming distance

I want to build a graph from a list of words with Hamming distance of (say) 1, or to put it differently, two words are connected if they only differ from one letter (lol -> lot). so that given words = [ lol, lot, bot ] the graph would be { 'lol'…
laurids
  • 931
  • 9
  • 24
16
votes
1 answer

Mysql hamming distance of hexadecimal values

I have some hashes stored in mysql, which I would fetch with comparison by hamming distance. Hashes stored are these: qw 1 ffe71b001820a1fd qw 2 ffffb81c1c3838a0 qw 3 fff8381c1c3e3828 qw 4 fffa181c3c2e3920 qw 5 fffa981c1c3e2820 qw 6…
125fura
  • 383
  • 2
  • 13
15
votes
3 answers

Finding a number of maximally different binary vectors from a set

Consider the set, S, of all binary vectors of length n where each contains exactly m ones; so there are n-m zeros in each vector. My goal is to construct a number, k, of vectors from S such that these vectors are as different as possible from each…
15
votes
1 answer

mysql hamming distance between two phash

I have a table A which has a column 'template_phash'. I store the phash generated from 400K images. Now I take a random image and generate a phash from that image. Now how do I query so that I can get the record from table A which hamming distance…
Gagan
  • 4,278
  • 7
  • 46
  • 71
13
votes
4 answers

Most efficient way to calculate hamming distance in ruby?

In ruby, what is the most efficient way to calculate the bit difference between two unsigned integers (e.g. the hamming distance)? Eg, I have integer a = 2323409845 and b = 1782647144. Their binary representations are: a =…
ch3rryc0ke
  • 2,793
  • 1
  • 22
  • 28
13
votes
4 answers

Fast Hamming distance scoring

There is a database with N fixed length strings. There is a query string of the same length. The problem is to fetch first k strings from the database that have the smallest Hamming distance to q. N is small (about 400), strings are long, fixed in…
Sardar
  • 131
  • 1
  • 4
13
votes
1 answer

Computing Hamming distances to several strings with SSE

I have n (8 bit) character strings all of them of the same length (say m), and another string s of the same length. I need to compute Hamming distances from s to each of the others strings. In plain C, something like: unsigned char…
pepeStck
  • 217
  • 2
  • 8
12
votes
2 answers

Sorting strings so that hamming distance is low between adjacent strings

Problem: I have N (~100k-1m) strings each D (e.g. 2000) characters long and with a low alphabet (eg 3 possible characters). I would like to sort these strings such that there are as few possible changes between adjacent strings (eg hamming distance…
bwgoudey
  • 123
  • 1
  • 5
12
votes
5 answers

Fast hamming distance computation between binary numpy arrays

I have two numpy arrays of the same length that contain binary values import numpy as np a=np.array([1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0]) b=np.array([1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1]) I want…
benbo
  • 1,471
  • 1
  • 16
  • 29
1
2 3
19 20