I am building a way to do proximity search in a positional inverted index - Implementing proximity search in positional inverted index nodejs. This is a sub-problem in that.
I have an array of arrays containing positions of different words on a page.
{
pageno: [
[positions of word 1],
[positions of word 2],
[positions of word n]
]
}
For eg -
{
1 : [
[1, 5, 6],
[2, 41],
[4, 7, 11]
],
2 : [
[1, 5, 6],
[2, 41],
[3, 7, 11]
]
}
I want to find, for each pageNo
, the number of occurrences such that the sum of the differences between positions of words do not exceed a specified value (proximity
).
If the value of proximity
is 1, all the words shouldn't have more than 1 word between them. So a "Hello world nodejs" should match "Hello world in nodejs" as there is only one word in between - 'in'.
But, it won't match 'hello from world in nodejs' as there are total 2 words in between- 'from' and 'in'.
Note, that jumbled words are allowed.
How to do this in JavaScript? - I was trying to do something like Finding matches between multiple JavaScript Arrays but couldn't make the necessary changes to make it work here.
The expected output for the above case would be (proximity: 2):
{
1 : 3,
2 : 3
}
Page 1: (1,2,4)-Proximity (2-1 -1)+(4-2 -1)=1, (5,2,4)- Proximity (5-4 -1)+(4-2 -1)=1 and (6,2,4)
Page 2: (1,2,3), (5,2,3), (6,2,3)