1

I'm new to DynamoDb and generally new to NoSql databases. I'm currently playing arround with a simple webapplication which is written in asp.net core 6 an run on aws lambda.

I have posts which can be in different channels. Users can make comments on posts. Posts and Comments can be up/down voted.

When trying to implement voting on comments I started getting the feeling, that my design is quite bad after reading for example the following question: Update nested map dynamodb

The issue is that:

  • I can't update votes of a comment without knowing the array index
  • the array index is prone to race conditions when an older comment (with lower index therefore) is deleted

Current data structur (3 tables):

guids are saved as strings

post: {
    pk -> id
    sk -> id
    id : guid
    createdAt : date
    createdBy : guid // user.id
    comments: [
        {
            id : guid
            text : string
            createdAt : date
            createdBy : guid // user.id
            upvotedBy : guid[] // user ids
            downvotedBy : guid[] // user ids
            upvotes : int // aggregated count of above list
            downvotes : int
        },
        { ... },
        { ... },
        { ... },
    ]
    commentCount : int
    upvotedBy : guid[] // user ids
    downvotedBy : guid[] // user ids
    upvotes : int // aggregated count of above list
    downvotes : int
    channel : guid // channel.id
    hidden : bool
}

user: {
    pk -> id
    sk -> id
    id : guid
    username : string
    password : string
    createdAt : date
    subscribed_channels : guid[] // channel ids
}

channel: {
    pk -> id
    sk -> id
    id : guid / string
    channelname : string
}

I have additional keys for:

    - posts
        - ChannelIndex   :   pk = channelname,  sk = createdAt
        - CreatedAtIndex :   pk = pk,           sk = createdAt
    - users
        - UsernameIndex :    pk = username,     sk = sk
    - channels
        - ChannelnameIndex : pk = channelname,  sk = sk

Usecases (ordered by frequency):

  • Access all posts without comments
    • from multiple channels
    • from single channels
      • sorted by date
      • sorted by most comments
      • sorted by most likes
    • from all channels
  • Access single post with (all) comments
  • vote on comment
  • vote on post
  • create comment
  • create post
  • delete comment
  • delete post

Question:

How to improve this design from a performance but also from the ease of implementation side?

My ideas: I thought about removing the nesting for comments making them a own table with a key beeing postId and createdAt However this feels like the design for a relational database.

Fritz
  • 831
  • 7
  • 23
  • 1
    What do you mean by `pk -> id` and `sk -> id` in post, user, and channel? Why would `user` have an SK, for example? – jarmod Jan 07 '23 at 16:13
  • 1
    Deeper conversation at [DynamoDB Modelling - Sorted nested comment threads and ranked Post feeds](https://www.reddit.com/r/aws/comments/wcy102/dynamodb_modelling_sorted_nested_comment_threads/). Also, I note that you are using GUIDs but be aware of the potential value of ULIDs or TSIDs for globally-unique IDs that are inherently time-sorted (which will potentially save you from needing GSIs and could reduce your storage requirement). – jarmod Jan 07 '23 at 16:29
  • @jarmod `pk -> id` and `sk -> id` mean that they are filled with the value of `id`. I did not know that sk is not mantatory this is a good hint, especially when searching by id. also replacing GUIDs is really helpfull :) the reditpost is much more complicated than my design. I don't have this difficult nesting, but it also implies that in my case i just should implement comments like in a relational database? – Fritz Jan 08 '23 at 14:21
  • 1
    I think you could do this with either NoSQL (e.g. DynamoDB) or standard RDBMS. If going the NoSQL route, you might consider Single Table design. I personally think that the hierarchical `downvotedBy:guid[]` idea is not a good one in DynamoDB for a couple of reasons: 1) it's very difficult to support concurrent updates and 2) item sizes are limited in DynamoDB so at some scale (e.g. 10000+ downvotes) you're going to run out of space in your item. – jarmod Jan 08 '23 at 18:13
  • Thanks for your comments :) I thought with `UpvotedBy = list_append(UpvotedBy, :userIdList)` I get around the concurrent update issue. I do not expect more than 1000 votes since this is a hobby project. Therefore I chose NoSql to learn something new. I'm sure RDBMS would be a way better fit for this schema in many ways. I'm currently implementing comments in a sperate table, with an additional Index for the parentPost – Fritz Jan 08 '23 at 20:16
  • 1
    Can voters rescind their vote? You'd have to retrieve the item, search through the list of upvoter IDs, remove the relevant voter ID, and rewrite the item attribute. Not necessarily difficult but requires some form of transaction/lock. If the list of voter IDs are all unique then use a string set rather than a list of strings. That's simpler, faster, and avoids problems with duplicates. Related: see [here](https://medium.com/hackernoon/safe-list-updates-with-dynamodb-adc44f2e7d3). – jarmod Jan 09 '23 at 16:04
  • @jarmod votes are not allowed to change. I implemented it with sets instead of lists for fast lookups. With this solution I would also be able to retrieve them. Is it a valid approach to merge post and comments to one table? `pk` is always the id of the post. for comments the `sk` will be prefixed with `c_` following its commentId. I switched to sortable IDs (Mongo.ObjectId). However I would need to have a index with only the pk to retrive all comments (+mainPost) per postId – Fritz Jan 09 '23 at 17:46
  • 1
    I would say yes on having both posts and comments in the same table. That's commonly done (single table design). For example: pk=postid, sk=postid for posts, pk=postid, sk=comment#commentid for comments. Then a simple query on pk=postid (without sk) gets you both the post and all associated comments. I think you're proposing `c_` instead of `comment#` which makes sense as it save storage and achieves exactly the same goal. – jarmod Jan 09 '23 at 20:09
  • exactly since my ids do not contain underscores. One issue I had was that I cant query only by using the pk if the main key of the table is (pk+sk). Do i need another key for this? (my issue was similar to: https://stackoverflow.com/questions/42757872/the-provided-key-element-does-not-match-the-schema-error-when-getting-an-item) – Fritz Jan 10 '23 at 15:54
  • 1
    When getting an item, you need the full primary key. For a composite key that means both pk+sk. But when [querying](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Query.html) for items you need pk and you can optionally include the sk or any prefix of the sk (assuming the table has a composite key, of course). That's what allows you to query for pk=postid, sk=c_ to return all the comments for a specific post. Also, it should go without saying, but don't ever use scan unless a) it's a small table or b) you're doing some kind of backup or export of the table. – jarmod Jan 10 '23 at 16:25
  • @jarmod thanks that's a really helpful explanation. I only use scan for admin stuff. I think that's cheaper than having keys for filters only admins can use. – Fritz Jan 10 '23 at 22:37

0 Answers0