7

I am working on a platform where unique user ID's are Identity ID's from a Amazon Cognito identity pool. Which look like so: "us-east-1:128d0a74-c82f-4553-916d-90053e4a8b0f"

The platform has a MySQL database that has a table of items that users can view. I need to add a favorites table that holds every favorited item of every user. This table could possibly grow to millions of rows.

The layout of the 'favorites' table would look like so:

userID, itemID, dateAdded

where userID and itemID together are a composite primary key.

My understanding is that this type of userID (practically an expanded UUID, that needs to be stored as a char or varchar) gives poor indexing performance. So using it as a key or index for millions of rows is discouraged.

My question is: Is my understanding correct, and should I be worried about performance later on due to this key? Are there any mitigations I can take to reduce performance risks?

My overall database knowledge isn't that great, so if this is a large problem...Would moving the favorited list to a NoSQL table (where the userID as a key would allow constant access time), and retrieving an array of favorited item ID's, to be used in a SELECT...WHERE IN query, be an acceptable alternative?

Thanks so much!

3 Answers3

6

Ok so here I want to say why this is not good, the alternative, and the read/write workflow of your application.

Why not: this is not a good architecture because if something happens to your Cognito user pool, you cant repopulate it with the same ids for each individual user. Moreover, Cognito is getting offered in more regions now; compare to last year. Lets say your users' base are in Indonesia, and now that Cognito is being available in Singapore; you want to move your user pools from Tokyo to Singapore; because of the latency issue; not only you have the problem of moving the users; you have the issue of populating your database; so your approach lacks the scalability, maintainability and breaks the single responsibility principle (updating Cognito required you to update the db and vica versa).

Alternative solution: leave the db index to the db domain; and use the username as the link between your db and your Cognito user pool. So:

Read work flow will be:

  1. User authentication: User authenticates and gets the token.

  2. Your app verifies the token, and from its payload get the username.

  3. You app contacts the db and get the information of the user, based on the username.

  4. Your app will bring the user to its page and provides the information which was stored in the database.

Write work flow will be:

  1. Your app gets the write request with the user with the token.

  2. verifies the token.

  3. Writes to the database based on the unique username.

o-0
  • 1,713
  • 14
  • 29
  • Another potential issue is that if by any chance you add more federated identities the ID is directly tied to each identity, so if you have a user singing in in different ways he's gonna have records associated to him with different IDs but for the same user in the user pool. – 8vius Jan 07 '19 at 18:08
2

Regarding MySQL, if you use the UserID and CognitoID composite for the primary key, it has a negative impact on query performance therefore not recommended for a large dataset.

However using this or even UserID for NoSQL DynamoDB is more suitable unless you have complex queries. You can also enforce security with AWS DynamoDB fine-grained access control connecting with Cognito Identity Pools.

Ashan
  • 18,898
  • 4
  • 47
  • 67
0

While cognito itself has some issues, which are discussed in this article, and there are too many to list...

It's a terrible idea to use cognito and then create a completely separate user Id to use as a PK. First of all it is also going to be a CHAR or VARCHAR, so it doesn't actually help. Additionally now you have extra complexity to deal with an imaginary problem. If you don't like what cognito is giving you then either pair it with another solution or replace it altogether.

Don't overengineer your solution to solve a trivial case that may never come up. Use the Cognito userId because you use Cognito. 99.9999% of the time this is all you need and will support your use case.

Specifically this SO post explains that there is are zero problems with your approach:

There's nothing wrong with using a CHAR or VARCHAR as a primary key.

Sure it'll take up a little more space than an INT in many cases, but there are many cases where it is the most logical choice and may even reduce the number of columns you need, improving efficiency, by avoiding the need to have a separate ID field.

Warren Parad
  • 3,910
  • 1
  • 20
  • 29