0

I have 50,000 user records that contain a field booklist with a list of comma separated bookIDs. I have a book table with bookID as the primary key with about 100,000 records.

I want to loop over all the users and their booklist to find out if there is a matching bookID in the book table. If it is not in the book table, I need to insert that bookID into the book table.

Can this be done with SQL alone?

If I use PHP, would I be better off combining all the lists of bookID into a long string or array and then looping over all bookID in the book table to find the missing ones?

This is a one time script, so the simpler the better ;-)

0x1F602
  • 865
  • 5
  • 22
uwe
  • 3,938
  • 11
  • 37
  • 50
  • 3
    Even though it's a tad too late to change, you've approached the problem wrong. You should have 3 tables: `users`, `books` and `booklist`, where the booklist have a many-to-many relationship between the books and users. – Madara's Ghost Feb 01 '12 at 20:08
  • 1
    possible duplicate of ["How to search for a comma separated value"](http://stackoverflow.com/q/3852443/), ["How can I match a comma separated list against a value?"](http://stackoverflow.com/q/1142246/), ["Select statement using IN operator"](http://stackoverflow.com/q/834074/). See also ["Is storing a comma separated list in a database column really that bad?"](http://stackoverflow.com/q/3653462/) – outis Feb 01 '12 at 21:10

3 Answers3

2

Well, you shouldn't have a comma separated, multi-valued field, and that would prevent a total SQL solution. I'm afraid you're in too deep now, right?

Let's look at your problem.

I want to loop over all the users and their list of bookIDs to find out if there is a matching bookID in the book table. If not I need to insert that bookID into the book table

Since you did this the silly way, start by:

  1. Selecting every user

  2. Exploding their booklist field into individual bookIDs

  3. Add these to a distinct (without repetition) set of bookIDs.

  4. Once you have every bookID every user has ever had in that field, loop through the book table. If you do not find any of the bookID's from your distinct set, insert those bookIDs to the book table.

What should have happened is that you should have had a separate table "booklist" with an id as a PK, which would have each userID and each bookID. Then you can pull every userID and therefore each bookID that userID has and vice versa.

Am I making sense?

edit: I also realized you should probably sort and search to be as efficient as possible, but you did mention that this is a one-time only program...

0x1F602
  • 865
  • 5
  • 22
  • thanks, that did the trick. I had decided to not follow good database design since those record were only for audit purposes and I wasn't planning on running queries against them. When I had to it turned out to be a major PITA ;-( Thanks everyone for the help! – uwe Feb 01 '12 at 23:55
1

As others have already said, having a comma-separated list of IDs on a database column should be avoided.

But since you already have it that way, you could use mysql FIND_IN_SET. It does string comparison, which will probably be slow, and works like this: you pass two strings to FIND_IN_SET, the first is the ID you are searching for, and the second is the comma-separated list. The function will return zero if the string is not found on the list, or the position of the string on the list if it's found.

I'm sorry that I can't give you a complete solution for your problem right now, but I hope this helps.

bfavaretto
  • 71,580
  • 16
  • 111
  • 150
0

Sorry for not answering your question, but the real answer is a db design like this. Look how simple and extensible for growth and querying. enter image description here

The association table (assoc_users_booklist) contains most of the records, with a many join on both uid and bid. The users table associates one to many on id = uid. The books table associates one to many on id = bid.

jamesTheProgrammer
  • 1,747
  • 4
  • 22
  • 34
  • This doesn't answer the question. It just tells them to re-do the database. It actually doesn't even address finding the values that are not found in the `assoc_user_booklist` – 0x1F602 Feb 01 '12 at 23:10