Check Similarity String Between Column In Google Sheet

Question

I have a table in the link below:

https://docs.google.com/spreadsheets/d/1EOALaBVzHijUP_8dM1Sr7KTutdTah8b9Q0xDRoNHBLo/edit#gid=0

if the text is split first, then check what do you do? for example "Kebumen District Office" Vs "District Head Office of Kebumen District" Then we need 7x7 columns = 49 columns because we will match for each word words 1-1, 1-2, 1-3, 1-4, 2-1, 2-2.2-3.2-4, etc. The text in column B is split and then checked for each word with the text in column A. If in column B there are many different words found the text is not similar.

Only I am still confused to make the formula. Please give me the solution sir. Thanks.

Please add all variants of input data and results you want right in the question. The link to the file is good when you're doing with some data model, but also good to include a picture of your data. — Max Makhrov, Aug 27 '18 at 14:34

Max Makhrov · Accepted Answer · 2018-08-28T08:16:10.493

1

The matching patterns are very different in your case and I see no solution based on formulas (regular expressions).

You may need to find articles about fuzzy vlookup.

Here's what I found for google sheets (not tested):

Addon, find fuzzy matches

This problem is common for Excel, there're solutions based on vba.

As I said, the one formula won't solve your task because you have many cases. First example Mc Donald vs McDonald is checked easily with a formula:

= substitute(A, " ", "") = substitute(B, " ", "")

Your next samples are different. You may use some code, but even this won't give the expected results. My suggestion: split the task into small cases and try to solve them separately. Make an investigation or ack a new question for each case.

Your second and 3-d lines are case2. In this case, you need to check all the words in A are also in B. You'll need to try solving it and ask another question if needed. And so on.

edited Aug 28 '18 at 08:16

answered Aug 28 '18 at 07:55

Max Makhrov

17,309
5
55
81

if the text is split first, then check what do you do? for example "Kebumen District Office" Vs "District Head Office of Kebumen District" Then we need 7x7 columns = 49 columns because we will match for each word words 1-1, 1-2, 1-3, 1-4, 2-1, 2-2.2-3.2-4, etc. only I am still confused to make the formula – Aug 28 '18 at 08:04
the text in column B is split and then checked for each word with the text in column A. If in column B there are many different words found the text is not similar. – Aug 28 '18 at 08:08
Please see my edit to the answer. Looks like your question is multiple questions in one. No way to answer it fully. – Max Makhrov Aug 28 '18 at 08:11
@lutfi-creativesys, I'm sorry, I have nothing to add to my answer. Hope this was helpful info. – Max Makhrov Aug 28 '18 at 08:35
thank you for the solution sir. fuzzy vlookup keywords have really helped my problem. – Aug 28 '18 at 09:01

score 0 · Answer 2 · 2019-07-27T05:24:48.770

0

Fuzzy matching is definitely the way to go. Different algorithms have different strengths and weaknesses. My suggestion is that you visit the G Suite marketplace and look for Flookup or simply follow this link:

Flookup for Google Sheets

It'll allow you to look for matches ranging from 0% to 100% similarity. The basic formula is:

FLOOKUP(lookupValue, tableArray, lookupCol, indexNum, [threshold], [rank])

Find out more from the official website.

Edit: I'm the creator of Flookup.

edited Jul 27 '19 at 05:24

answered Sep 18 '18 at 06:00

1

Flookup looks very interesting but it's suspiciously anonymous. The terms and conditions are an Apache license, but without a link to the source code. Is this meant to be open-source? – Jacek Kopecký Jun 04 '19 at 20:42
_suspiciously anonymous_? What information would you require to put your suspicions to rest. The code was initially meant to be open source but I changed my mind after a few updates. Changes to the license will be added soon. – Jun 06 '19 at 06:17
Hi, @Andrew, it's impossible to find out anything about the author(s), and the inconsistency between the license and the lack of links to source was raising red flags. I can't link flookup to any reputable source, or any source at all. Hence it looks anonymous, and that's suspicious. Open up the code, or say who the author(s) are on the site please. – Jacek Kopecký Jun 07 '19 at 08:21
@Jacek thank you for your feedback and suggestion. I'll add information about the author (me) on the website and possibly the underlying logic the add-on uses. Otherwise, I updated the license to reflect the state of the add-on better. That said, please be rest assured that the add-on is completely safe to use and I'm willing to answer any privacy/security related questions about it. – Jun 07 '19 at 18:09

Check Similarity String Between Column In Google Sheet

2 Answers2

Linked