0

I have two tables that I want to join Domain and Links with 1k rows and with 700k rows respectively

create table Domain (url varchar(255), tag varchar(255));
create table Links (ShortURL varchar(255), LongURL varchar(255));

The output should be joined on partial match of ALL strings Domain.url found in Links.LongURL and the result should return 700k rows from the Links table and where no match found should be null

Tried the following from this thread and here

SELECT ShortURL,LongURL,tag
FROM Links fulljoin Domain 
ON Links.LongURL LIKE concat('%', Domain.url, '%');
Community
  • 1
  • 1
Seun AJAO
  • 611
  • 1
  • 9
  • 15

3 Answers3

2

This is going to be a time consuming operation. Your query is on the right track, but the correct syntax is:

SELECT ShortURL, l.LongURL, tag
FROM Links l LEFT JOIN
     Domain d
     ON l.LongURL LIKE concat('%', d.url, '%');

That said, with your two tables, this query is going to take a while. It needs to do about 700,000,000 comparisons using like. There is no easy way to speed up the query.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

In order to return all Links regardless of whether there is a matching Domain, you can use LEFT OUTER JOIN.

To make sure each Links row is returned only once, you should GROUP BY the Links columns and just return one Domain (using an aggregate function like MIN() or MAX()) in case there are multiple Domain matches for a given Links row.

Something like this:

select Links.ShortURL, Links.LongURL, min(Domain.tag)
from Links 
  left outer join Domain on Links.LongURL like concat('%', Domain.url, '%')
group by Links.ShortURL, Links.LongURL

Your existing LIKE statement will be very slow, and cannot benefit from an index on Links.LongURL. How are you storing the URLs and domains? If they have a consistent prefix, for example all Links.LongURL and Domain.url values start with https://, then you could add an index on Links.LongURL, get rid of the leading wildcard, and make the query much faster, like this:

select Links.ShortURL, Links.LongURL, min(Domain.tag)
from Links 
  left outer join Domain on Links.LongURL like concat(Domain.url, '%')
group by Links.ShortURL, Links.LongURL
Ike Walker
  • 64,401
  • 14
  • 110
  • 109
1

If the DOMAIN Name start every time at the begin of the string you can speed up like this. than can MySQL use a INDEX:

SELECT ShortURL,LongURL,tag
FROM Links fulljoin Domain 
ON
  Links.LongURL LIKE concat(Domain.url, '%')
OR
  Links.LongURL LIKE concat('www.',Domain.url, '%');
Bernd Buffen
  • 14,525
  • 2
  • 24
  • 39