1

Essentially I am using a python library to query for synonyms of a word, and then adding that word, along with its synonyms to a postgreSQL table. It is fine if some of the table columns are empty, but I do not know how to insert the synonyms to the database because I don't know how many there will be.

E.g I have a 6 column table, 1 for the original word, 5 for its synonyms. For one word I may get 3 synonyms, but for others I might get all 5.

The only way I've managed to program this task is a horrible group of if statements as so:

for word in textlist:
syns = dictionary.synonym(word)
num = len(syns)

if len(syns) == 5:
    for i in syns:
        syn1 = i[0]
        syn2 = i[1]
        syn3 = i[2]
        syn4 = i[3]
        syn5 = i[4]

And so on... and then use those variables to insert the synonyms into the database table.

One way I was thinking of was to build another list of the word and synonyms and loop through that to add to the table. But I have no idea how to go about that. Any help on how to push further would be a big help. Thank you.

Note: syns returns a list object.

EDIT:

So thanks to @systemjack, I've changed the system in such a way that there are 3 table columns. id (which is a serial primary key), word and synos (originally syn, but changed just to make sure it isn't some weird keyword). Each word should be added in a row along with one synonym and then the next row could be the same word with a different synonym and so on...Anyway the new method is as follows:

for word in textlist:
    syns = dictionary.synonym(word)
    if syns is not None:
        for syn in syns:
            cursor.execute('INSERT INTO wordsyn (word, "synos") VALUES (%s, %s);', (word, syn))

I now however get the error psycopg2.ProgrammingError: column "synos" of relation "wordsyn" does not exist LINE 1: INSERT INTO wordsyn (word, "synos") VALUES ('&c

I've put double quotes around synos in an attempt to make it work, but it doesn't work without them either. Any further help would be greatly appreciated.

\d+ wordsyn outputs:

\d+ wordsyn creates outputs:

FINAL EDIT: Found the problem. Didn't have the table in the correct db. I thought \dt only showed tables in your current database, but apparently not.

Generic Snake
  • 575
  • 1
  • 5
  • 13
  • If changing your database structure is a possibility then we can easily come up with a proper solution. Is that an option? – gipsy Mar 03 '17 at 02:59
  • @gipsy It can be yes. I haven't particularly mapped it out in any way. But it somewhat depends on what you recommend. – Generic Snake Mar 03 '17 at 03:52
  • If you give an example use case for the resulting tables it would help to figure out the best schema. – systemjack Mar 03 '17 at 17:18
  • If you are using psql would you add the output of `\d+ wordsyn`? – systemjack Mar 04 '17 at 01:01
  • @systemjack added – Generic Snake Mar 04 '17 at 01:15
  • What postgres library are you using? – systemjack Mar 04 '17 at 01:54
  • psycopg2 to connect with python, and it's psql version 9.5.6 (I think that's what you're asking) – Generic Snake Mar 04 '17 at 01:57
  • 1
    I'm stumped. Everything looks right to me. Does a manual insert with psql work? LIke: `insert into wordsyn (word, synos) values ('foo', 'bar');` – systemjack Mar 04 '17 at 20:13
  • @systemjack Yep, that works. I'm going to check to make sure it's not something wrong with the way the python library returns the synonyms. Maybe that's affecting it somehow. – Generic Snake Mar 04 '17 at 23:32
  • @systemjack So I worked out the issue, and it's a really silly one. Essentially, when I created the table, I created it inside the initial database you come to when starting postgres, and not the one I had made for this table. I used \dt in the correct database and it showed the table. I thought that \dt only showed tables in your current database! Changing the db position fixed the problem. Thank you for your help, wouldn't have gotten this far without it :D ! – Generic Snake Mar 09 '17 at 05:35

2 Answers2

3

A typical relational approach would set that up as a two column table with a word column and a second synonym column. Then you'd have a row for each word-synonym combination.

Your queries that make use of the data are a lot more straightforward in that case. You really don't want to be dealing with a variable number of populated columns. Also, what happens if there are 6 synonyms for a word?

The simplest implementation requires multiple inserts per word, but that's not a big deal. There are also ways to batch up the inserts as well.

If you really want to have one row per word you can use a combination of SPLIT_PART and STRING_AGG to store variable numbers of values in one column. I highly recommend going with the former approach though.

You can also manage more complex dynamic structures as JSON.

Keep in mind that both these last two approaches would be much slower.

With the first method your insert would be something like:

for word in textlist:
    syns = dictionary.synonym(word)
    for syn in syns:
        cursor.execute('insert into syns (word, syn) values (%s, %s);', (word, syn))

Then if you wanted to see all the synonyms for a word you could do something like:

select word, string_agg(syn, ', ') 
from syns where word='perspicuitous'
group by word;
Community
  • 1
  • 1
systemjack
  • 2,815
  • 17
  • 26
  • I think you say col1 is word, col2 is a syn for that word with multiple rows for a word as necessary. But then col1 cannot be a primary key. – Marichyasana Mar 03 '17 at 08:41
  • Yep. Corrected. Thanks. – systemjack Mar 03 '17 at 17:13
  • @systemjack Thanks for the reply. This looks like it will work great. I'll get back to you when I've tried it out. – Generic Snake Mar 03 '17 at 19:47
  • @systemjack Ok, so I think this will work. I created a table with 3 columns instead (called wordsyn), one column is an automatically generating id (i'll need it for later) the other two are as you said, one for the word - called word - and one for the synonym - called syn. My code follows your example, but I get the error `psycopg2.ProgrammingError: column "syn" of relation "wordsyn" does not exist` I'm not sure what this could be as that is definitely the name of the column. – Generic Snake Mar 04 '17 at 00:00
  • If you do `select word, "syn" from wordsyn limit 1;`, does it work? If not try `select * from wordsyn limit 1;` and check out what the headers are. – systemjack Mar 04 '17 at 00:10
  • @systemjack yep, tried in the command console and they both work. first one returns just the word and syn columns, the 2nd returns id, word and syn columns. – Generic Snake Mar 04 '17 at 00:12
  • Try double quoting it in the original query...maybe Postgres has a strange reserved word I'm not aware of. If not then update the question with the query you're running. – systemjack Mar 04 '17 at 00:12
  • @systemjack Still not working. I've updated the question. Thank you for your help so far. :) – Generic Snake Mar 04 '17 at 00:40
0

One approch will be to create 2 tables One for word and another for the synonyms like below

Word:

id  - primary key - could be autogenerated.
word - sring - To store the word

Synoym:

id -- primary key - could be autogenerated.
word_id --  Foreign key to Word table id
synonym --  String - The synonym 

For each word, lookup in Word table if word exists , if not insert it, else just grab the id of the existing entry. Get the synonyms for the word using dictionary.synonym(word). For each synonyms lookup the synonym table with word id from above and the synonym, if not exists insert else move on to next synonym.

Repeat this for all words.

Selecting synonyms for a word become a query like below:

select syn.synonym from synonym syn , word wd 
where wd.id = syn.word_id and wd.word = 'foo'
gipsy
  • 3,859
  • 1
  • 13
  • 21