1

I have a Rails 3 application that has a model w/ a Name, and a Geographic Location (lat/lng). How would I go about search for possible duplicates in my model. I want to create a cron job or something that checks to see if two objects have a similar name and that they are less than 0.5 miles away from each other. If this matches then we'll flag the objects or something.

I am using Ruby Geocoder and ThinkingSphinx in my application.

Kyle Decot
  • 20,715
  • 39
  • 142
  • 263
  • What do you mean by similar name? Do you mean the same name? different by cases? or even something like Vancouver & Vancuover – Kyle d'Oliveira Jul 20 '11 at 01:25
  • [levenshtein](http://raa.ruby-lang.org/project/levenshtein/) looks like what I want I think. – Kyle Decot Jul 20 '11 at 01:43
  • Maybe you could think of a way to put that as a validation in your model instead of a cron job. – christianblais Jul 20 '11 at 03:21
  • Whether it goes in a Cron or in a validation is trivial. The actual logic of finding the duplicate is what I'm asking for. – Kyle Decot Jul 20 '11 at 07:02
  • If you're focusing on finding duplicates take a look at http://stackoverflow.com/questions/2531502/detect-similar-sounding-words-in-ruby/2533033#2533033 . It will solve the problem of detecting similar words in a simple way – lucapette Jul 20 '11 at 16:46

1 Answers1

1

Levenshtein is as good a way as any for judging the similarity of two text strings, ie the names.

What i would suggest is to (as well as, or instead of, the single "lat;long" string) store the latitude and longitude seperately. Then you can do an sql query to find other records that are within a certain distance, THEN run the levenshtein on their names. You want to try to run the lev as few times as possible as it's slow.

Then you could do something like this: let's say your model name is "Place":

class Place < ActiveRecord::Base

  def nearby_places
    range = 0.005; #adjust this to get the proximity you want
    #lat and long are fields to hold the latitude and longitude as floats
    Place.find(:all, :conditions => ["id <> ? and lat > ? and lat < ? and long > ? and long < ?", self.id, self.lat - range, self.lat + range, self.long - range, self.long + range])
  end

  def similars
    self.nearby_places.select do |place|
      #levenshtein logic here - return true if self.name and place.name are similar according to your criteria
    end
  end

end

I've set range to 0.005 but i've no idea what it should be for 1/2 a mile. Let's work it out: google says one degree of latitude is 69.13 miles, so i guess half a mile in degrees would be 1/(69.13 * 2) which gives 0.0072, so not a bad guess :)

Note that my search logic would return places that are anywhere within a square which is a mile per side, with our current place in the centre. This would potentially include more places than a circle with 1/2 mile radius with our current place in the centre, but it's probably fine as a quick way of getting some nearby places.

Max Williams
  • 32,435
  • 31
  • 130
  • 197