Questions tagged [string-metric]

In computer science a string metric is a value that measures the similarity between two strings.

15 questions
44
votes
5 answers

How to compare almost similar Strings in Java? (String distance measure)

I would like to compare two strings and get some score how much these look alike. For example "The sentence is almost similar" and "The sentence is similar". I'm not familiar with existing methods in Java, but for PHP I know the levenshtein…
hsmit
  • 3,906
  • 7
  • 34
  • 46
20
votes
6 answers

Alternative to Levenshtein and Trigram

Say I have the following two strings in my database: (1) 'Levi Watkins Learning Center - Alabama State University' (2) 'ETH Library' My software receives free text inputs from a data source, and it should match those free texts to the pre-defined…
Jonas Sourlier
  • 13,684
  • 16
  • 77
  • 148
8
votes
8 answers

Mapping arbitrary strings to RGB values

I have a huge set of arbitrary natural language strings. For my tool to analyze them I need to convert each string to unique color value (RGB or other). I need color contrast to depend on string similarity (the more string is different from other,…
Alexander Gladysh
  • 39,865
  • 32
  • 103
  • 160
7
votes
1 answer

Levenshtein Matrix using only a diagonal strip

According to wikipedia there's a possible modification to the Wagner-Fischer-algorithm that can calculate if the Levenshtein distance of two words is lower than a certain threshold, that is much quicker than the original one if that's all you want…
6
votes
8 answers

"Absolute" string metric

I have a huge (but finite) set of natural language strings. I need a way to convert each string to a numeric value. For any given string the value must be the same every time. The more "different" two given strings are, the more different two…
Alexander Gladysh
  • 39,865
  • 32
  • 103
  • 160
6
votes
2 answers

How do I compare the similarity of person names using a metric?

I am particularly working on a function to allow the misspelled and aliases of person names. I have done some research & found there are quite a number of algorithms for String metric and phonetic libraries too. I have tried some and of all those…
Vamsidhar
  • 822
  • 11
  • 24
3
votes
0 answers

Block edit distance with Swapping only

Suppose I have distinct alphabets ∑={a1,a2,...,an}. I also have two permutations of these alphabets, let's call them A,B. How can I find the Edit distance between A and B with block edit operations allowed? To make it clearer, an example would be…
AspiringMat
  • 2,161
  • 2
  • 21
  • 33
2
votes
2 answers

Find element similarity within a collection of strings without evaluating all element pairs

So the problem collection is something like: A = {'abc', 'abc', 'abd', 'bcde', 'acbdg', ...} Using some type of string metric like Levenshtein distance, it's simple enough to find some sort of heuristic of string similarity between 2…
Dennis
  • 597
  • 4
  • 8
  • 24
1
vote
4 answers

Diff comparison word by word and display changes

Could be marked as duplicated, but I haven't found a propper solution yet. I need to write a function that compares 2 pieces of text word by word, and prints out the text showing added/deleted/changed words. For example: StringOriginal = "I am Tim…
Tim Maes
  • 473
  • 3
  • 15
1
vote
1 answer

Should I use StringMetric or MultisetMetric for comparing these Strings with simmetric

I've been using the [Simmetrics][1] Java library with good success for comparing two Strings with good success. But there seem to be two approaches and I need a combination of both for my scenario. Currently I am using CosineSimilarity (I do use…
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
1
vote
0 answers

String metric that has low weight for extraneous characters

I'm trying to find a string metric to find the most similar entry in my list to an arbitrary input. It looks like most common string metrics place heavy weight on extraneous characters, even if a substring matches perfectly. For example, 'Corvette,…
ericksonla
  • 1,247
  • 3
  • 18
  • 34
0
votes
1 answer

Weighted Distance Matrix for QWERTZ Keyboard for Levenshtein Distance Algorithm

I have a weight Matrix for a Levenshtein Distance Algorithm which looks like this int[,] weights = new int[6, 6] { { 0, 1, 2, 1, 1, 2 }, { 1, 0, 1, 2, 1, 2 }, { 2, 1, 0, 3, 2, 3 }, { 1, 2, 3, 0, 1, 2 }, …
0
votes
0 answers

Recommended String Metric Algorithm for string detection?

We're trying to choose a string metric algorithm for our string comparison program. Which would be the best string metric algorithm if we want to detect misspellings and alteration of the word like changing letters to words or symbols, adding extra…
0
votes
1 answer

Identify strings with same meaning in java

I have the following problem. I want to identify strings in java that have a similar meaning. I tried to calculate similarities between strings with Stringmetrics. This works as expected but I need something more convenient. For example when I have…
sstoeferle
  • 37
  • 2
  • 5
0
votes
1 answer

Postgresql: Processing Text, Detect out of Alphabetical order rows

I have some processed text that's in (mostly) alphabetical order, e.g. these are the first word of each paragraph: Adelanto Agoura Hills Alameda Albany Old Albany New Albany Alhambra Aliso Viejo Alturas So each of the words above represents…
MichaelStoner
  • 889
  • 10
  • 26