1

I was asked in a job interview to design the database for a dictionary that keeps tracks of synonyms. The synonym part was a later question in the interview. I had a table like

    Words
=============
id |  ...
=============
 1 |   .
 2 |   .
 . |   .
 . |   .

and so to relate words to words I made a table

       Synonyms 
======================
 wordA_id | wordB_id
======================
     .    |    .
     .    |    . 
     .    |    .

For example, if words with id equal to 1, 69 and 72 are synonyms then

     1    |    69
     69   |    72

might be an entry in Synonyms. But so might

     1    |    69
     1    |    72

I realized. And the pairing order doesn't matter, so

     69   |   1
     1    |   72

words just as well. And we probably don't want duplicate records of any order, so something like

     69   |   1
     1    |   72
     69   |   1

of

     69   |   1
     1    |   72
     1    |   69

I'm wondering the best way to configure the database to ensure all this. What I suggested (in the interview) is inserting synynoms through a sproc that checks all this, like

CREATE PROCEDURE InsertSynonym
   @idA INT,
   @idB INT
AS
   -- make @idA the smaller if it isn't already
   IF @idA > @idB 
   BEGIN
      @DECLARE @temp INT
      SET @temp = @idA
      @idA = @idB
      @idB = @temp
   END
   -- insert if not already there
   IF NOT EXISTS (SELECT * FROM Synonyms WHERE wordA_id=@idA)
   BEGIN
      INSERT INTO Synonyms (wordA_id,wordB_id) VALUES (@idA,@idB)
   END
  • Are there any bugs in my T-SQL syntax?
  • Assuming that the only way synonyms are inserted is through this procedure, does the logic cover all the cases and does it keep the size of the Synonyms table from exploding?
  • Any better way to design this synonyms pattern to begin with?
  • By the way, is it possible to do an XOR swap in T-SQL?
user6048670
  • 2,861
  • 4
  • 16
  • 20
  • Want to point out if you’re still doing interviews, you would probably want to add to your synonyms table schema some numeric `rank` column for sorts; think about when you look at a thesaurus, the synonyms for a given word are usually listed in order of how similar their meaning is to your word. This also makes your bidirectional map optimization not work, an optimization I’d say is unnecessary. – Jony Thrive Jan 18 '18 at 00:26

2 Answers2

1

There is one bug I see but it's not in syntax, it's in logic. The IF NOT EXISTS should contain in it's where clause both wordA_Id and wordB_Id.
Also, it's not clear from your description of the database, but the words in words table should have a unique index on them, as well as the combination of both columns in the Synonyms table.

I'm not sure if a better database design exists, but there is another option and that's using a self-reference in the words table. However, this would require the self-reference column to be nullable, and while it is perfectly supported by all major databases, it's frowned upon.
To answer your question about xor swap - it is possible in t-sql:

DECLARE @A int = 5, @B int = 7;

SELECT @A as A, @B as B

SELECT  @A = @A ^ @B,
        @B = @A ^ @B,
        @A = @A ^ @B

SELECT @A as A, @B as B

results:

A           B
----------- -----------
5           7


A           B
----------- -----------
7           5
Community
  • 1
  • 1
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
0

Maybe you can add a new field on Word table named SynonymId referencing a new table named Synonyms

In this Synonyms table you can keep SynonymId (non-unique), WordId

So when you want to get the list of synonyms of a word, read the field SynonymId of that row

Query Synonyms with same SynonymId joining Words over WordId

Here is how to select synonyms list can be queried with table structure and sample data

create table words(wordid int, word varchar(100), synonymid int)
create table synonyms(synonymid int, wordid int)
insert into words values (1, 'synonym', 1), (2, 'equivalence', 1), (3, 'opposite', null), (4, 'sameness', 1), (5, 'similarity',1 )
insert into synonyms values(1,1),(1,2),(1,3),(1,4)


declare @word varchar(100) = 'sameness'

SELECT
@word as word,  STUFF(
    (
select
    ',' + sw.word
from words w
left join [synonyms] s on s.synonymid = w.synonymid
left join words sw on s.wordid = sw.wordid
where
    w.word = @word
FOR XML PATH('') 
), 1, 1, ''
  ) As synonyms

Here is how to concatenate strings in SQL representing synonyms

Eralper
  • 6,461
  • 2
  • 21
  • 27