sqlite insert while drop complete duplicate

Question

I am crawling some website and store the results in sqlite, since my crawling may repeat every month, but I don't want to store the duplicate results in my database, what should I do?

For example, the first time I will

insert into tableName(A, B, C, D, E) values(a,b,c,d,e)

then my database will have this record, the next time I crawl this website, I will do the same, but I do not want to store the duplicate results, what is the best way I can do?

Besides, what to do if I am using MySQL?

What constitutes a duplicate record? Must all five values be the same? — unutbu, Jun 07 '15 at 19:08

score 1 · Answer 1 · answered Jun 07 '15 at 18:45

You can use insert ignore

SQLite

insert or ignore into tableName(A, B, C, D, E) values(a,b,c,d,e)

MySQL

insert ignore into tableName(A, B, C, D, E) values(a,b,c,d,e)

Be carefull that the difference between the two code are the or, they are not the same.

This will only work if they have some kind of primary key.

score 1 · Accepted Answer · edited May 23 '17 at 11:59

1

Using sqlite, create your table with a UNIQUE constraint:

CREATE TABLE tableName (A INTEGER, B INTEGER, C INTEGER, D INTEGER, E INTEGER,
UNIQUE (A, B, C, D, E));

then

INSERT OR IGNORE INTO tableName (A, B, C, D, E) values (a,b,c,d,e);

will only insert a new record if (a,b,c,d,e) is unique -- i.e. not already present in the table tableName.

To add a UNIQUE index to an already existent table:

CREATE UNIQUE INDEX unique_idx on tableName(A, B, C, D, E);

or, using MySQL, create a table with a unique index:

CREATE TABLE tableName (A INT, B INT, C INT, D INT, E INT,
UNIQUE KEY unique_idx (A, B, C, D, E))

then use INSERT IGNORE:

INSERT IGNORE INTO tableName (A, B, C, D, E) values (a,b,c,d,e)

To add a UNIQUE KEY index to an already existent table:

ALTER TABLE tableName ADD UNIQUE KEY unique_idx (A, B, C, D, E);

edited May 23 '17 at 11:59

Community

1
1

answered Jun 07 '15 at 19:09

unutbu

842,883
184
1,785
1,677

does each table have only one index or several? – 1a1a11a Jun 07 '15 at 19:17
1

One table can have several. – unutbu Jun 07 '15 at 19:19
Thanks! another question, in this way, does it mean all A (or B or C..) needs to be unique? – 1a1a11a Jun 07 '15 at 19:19
1

No, it means the tuple (A,B,C,D,E) has to be unique. You can of course change the criteria... – unutbu Jun 07 '15 at 19:20
Thanks, you are so helpful! – 1a1a11a Jun 07 '15 at 19:20

score 0 · Answer 3 · answered Jun 07 '15 at 18:46

0

You can create a temporary/auxiliary table say temp_tableNamehaving same schema as tableName. Then while crawling perform like

insert into temp_tableName select * from tableName;

delete from tableName;

insert into tableName(A, B, C, D, E) values(a,b,c,d,e)

Auxiliary table is needed cause if the crawling insert fails then you have a back up to store from. Also, consider doing this steps DELETE/INSERT in a Transaction block for atomicity.

answered Jun 07 '15 at 18:46

Rahul

76,197
13
71
125

This helps, but is there anyway that I can just drop the duplicate if I found it is already in the table? My initial thought was to do select * from tableName where A=a, B=b, C=c, D=d, E=e and check whether there is such record. Is it a good way to do this? – 1a1a11a Jun 07 '15 at 18:53
Depends, If you are performing row/row insert then probably yes but would be bit costly. Since insert happens from crawler I believe it would be a bulk insert; answered keeping that in mind. – Rahul Jun 07 '15 at 18:54

sqlite insert while drop complete duplicate

3 Answers3