3

Lets say we have the following array of strings (this array is a lot bigger):

[
  'http://www.example.com?id=123456',
  'http://www.example.com?id=234567'
]

As you can see, everything up to the first digit is the same in both strings. Is there a way to easily find what both strings have in common and what is different? So that I get a string like 'http://www.example.com?id=' and and array like ['123456', '234567'].

Severin
  • 8,508
  • 14
  • 68
  • 117
  • 2
    What should happen for `['http://www.example.com?id=123456', 'http://www.example.com?id=123457']`? – BroiSatse Aug 21 '14 at 12:13
  • Regexp might not be the ideal tool for this. Building a trie is straightforward, not horribly slow, and solves your problem. – Taemyr Aug 21 '14 at 12:13
  • It should output (as said in the question) 'http://www.example.com?id=', since both strings have it in common. – Severin Aug 21 '14 at 12:14
  • @Taemyr Could you give an example of how you would do that? – Severin Aug 21 '14 at 12:14
  • @Severin Never programmed in Ruby. http://stackoverflow.com/questions/9042426/explanation-of-ruby-code-for-building-trie-data-structures discussed it, but the answer invokes commands that I am not familiar with so I can't vouch for it's function. Trie's are a common data structure so it's fairly googleable. - Also note that it's a bit of an overkill for your problem. – Taemyr Aug 21 '14 at 12:21
  • @BroiSatse For your example it should output 'http://www.example.com?id=12345' – Severin Aug 21 '14 at 12:23
  • Post any Ruby/Regex you've `trie`d so far – skamazin Aug 21 '14 at 12:29
  • You can reuse my solution from this quotestion: http://stackoverflow.com/questions/25406102/longest-common-prefix-and-suffix-of-arrays. It was for an array, ut should work for strings as well – BroiSatse Aug 21 '14 at 12:32
  • Don't you want to parse the URL first? Multiple URL parameters (the key/value pairs after the `?`) can be in any order. Or do you not want to apply any URL logic here and use a strict substring comparison? – Mark Thomas Aug 21 '14 at 12:41
  • Simple string comparison will do the trick. – Severin Aug 21 '14 at 12:42

2 Answers2

2

Here's a method to find the longest common prefix in an array.

def _lcp(str1, str2)
  end_index = [str1.length, str2.length].min - 1
  end_index.downto(0) do |i|
    return str1[0..i] if str1[0..i] == str2[0..i]
  end
  ''
end

def lcp(strings)
  strings.inject do |acc, str|
    _lcp(acc, str)
  end
end


lcp [
  'http://www.example.com?id=123456',
  'http://www.example.com?id=234567',
  'http://www.example.com?id=987654'
]
#=> "http://www.example.com?id="

lcp [
  'http://www.example.com?id=123456',
  'http://www.example.com?id=123457'
]
#=> "http://www.example.com?id=12345"
Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
0
# This is an approach using higher level ruby std-lib components instead of a regex.
# Why re-invent the wheel?
module UriHelper
    require 'uri'
    require 'cgi'

    # Take an array of urls and extract the id parameter.
    # @param urls {Array} an array of urls to parse
    # @returns {Array}
    def UriHelper.get_id_params( urls )
        urls.map do |u| 
            puts u
            uri = URI(u)
            params = CGI::parse(uri.query)  
            params["id"].first # returned
        end
    end
end

require "test/unit"
# This is unit test proving our helper works as intended
class TestUriHelper < Test::Unit::TestCase
  def test_get_id_params
    urls = [
        'http://www.example.com?id=123456',
        'http://www.example.com?id=234567'
    ]
    assert_equal("123456", UriHelper.get_id_params(urls).first )
    assert_equal("234567", UriHelper.get_id_params(urls).last )
  end
end
max
  • 96,212
  • 14
  • 104
  • 165