0

I have an SQL table, with a list of packages (order of 10000 unique entries) and the corresponding categories of any given package (order of 100 unique entries). A given package can be part of more than one category (order of 15000 unique combinations, which is thus the size of the table).

All package names are supposed to be unique and not conflict with category names, however, that doesn't appear to be the case.

What I need is figure out if any of the 10k packages has the same name as any of the 100 categories, and, if so, amend all such package names to include a ,pkg suffix, to ensure that access to the two namespaces could be shared, and remain unique.


The simplest case would be this:

sqlite> select * from categories where value=fullpkgpath;
lang/mono|lang/mono

sqlite> select fullpkgpath, value from categories where fullpkgpath="lang/mono";
lang/mono|devel
lang/mono|lang
lang/mono|lang/mono

Which should be converted into this:

lang/mono,pkg|devel
lang/mono,pkg|lang
lang/mono,pkg|lang/mono

However, there could also be cases where there is a more indirect conflict (a given package conflicting with a category that it's not part of), like this:

sqlite> select * from categories where fullpkgpath="lang/erlang" or fullpkgpath="lang/node";
lang/erlang|lang
lang/node|devel
lang/node|lang
lang/node|lang/python

Which should be converted into this:

lang/erlang,pkg|lang
lang/node,pkg|devel
lang/node,pkg|lang
lang/node,pkg|lang/python

Since both lang/erlang and lang/node are themselves categories, too:

sqlite> select * from categories where value="lang/erlang" limit 8;
databases/erl-Emysql|lang/erlang
databases/erl-couchbeam|lang/erlang
databases/erl-epgsql|lang/erlang
databases/erl-sqerl|lang/erlang
devel/erl-automeck|lang/erlang
devel/erl-bear|lang/erlang
devel/erl-depsolver|lang/erlang
devel/erl-ej|lang/erlang

sqlite> select * from categories where value="lang/node" limit 8;
databases/node-pg|lang/node
databases/node-sqlite3|lang/node
devel/node-async|lang/node
devel/node-bindings|lang/node
devel/node-buffer-writer|lang/node
devel/node-cloned|lang/node
devel/node-expresso|lang/node
devel/node-fibers|lang/node

I'm using sqlite3 and perl on OpenBSD. The database, if needed, is available on ftp.

cnst
  • 25,870
  • 6
  • 90
  • 122
  • Why not just append `,pkg` to the end of *every* package name? – ThisSuitIsBlackNot Jan 17 '14 at 17:53
  • Also, why do the names have to be unique? You have them in separate columns for a reason. – ThisSuitIsBlackNot Jan 17 '14 at 17:57
  • They have to be unique in order to be uniquely accessible from a single namespace elsewhere. Adding `,pkg` to every package is ugly -- there is only like an order of 10 packages that have a conflicting name, so, it'd be unfair to the rest of the 10000 packages to suffer a renaming because of that. – cnst Jan 17 '14 at 18:08
  • The point of my second comment was that the application making database queries knows which things are packages and which are categories, because they're in separate columns. You haven't shown the code that requires these names "to be uniquely accessible from a single namespace elsewhere," so I'm not convinced you need to alter the data. I could very well be wrong, I'm just saying I'm not convinced :-) – ThisSuitIsBlackNot Jan 17 '14 at 18:18
  • @ThisSuitIsBlackNot, the site that it powers is statically generated, and right now it's essentially missing 3 of about 9000 pages due to this bug. – cnst Jan 17 '14 at 18:21

1 Answers1

2

To check whether a value is in a set of values, use IN:

UPDATE categories
SET fullpkgpath = fullpkgpath || ',pkg'
WHERE fullpkgpath IN (SELECT value
                      FROM categories)
CL.
  • 173,858
  • 17
  • 217
  • 259
  • works great, thanks! I have a related follow-up question, if you don't mind: http://stackoverflow.com/questions/21195043/sql-update-a-given-field-in-all-tables-instead-of-just-one-table – cnst Jan 17 '14 at 20:16