Apache Cassandra schema design

Question

I have following setup:
Have CF items and CF keywords.
Each item have zero, one or more keywords, stored in columns.
Each keyword have one or more items, stored in columns.
It looks like this:


    items {
        dl { name => DELL6400,  keyword:1 => computer, keyword:2 => DELL, keyword:3 => topseller  }
        hp { name => HP12345,   keyword:1 => computer, keyword:2 => HP    }
        no { name => Nokia8210, keyword:1 => phone,    keyword:2 => NOKIA }
    }

    // here I store keys of the items only,
    // in reality I have denormalized most of items columns
    keywords{
        computer  { webpage => www.domain.com/computer , item:dl => dl , item:hp => hp }
        DELL      { webpage => www.domain.com/dell ,     item:dl => dl }
        topseller { webpage => www.domain.com/top ,      item:dl => dl }
        HP        { webpage => www.domain.com/hp ,       item:hp => hp }
        NOKIA     { webpage => www.domain.com/nokia ,    item:no => no }
        phone     { webpage => www.domain.com/phone ,    item:no => no }
    }

when I add new item, I am adding "webpage" column in keywords if neccessary.
when I am removing an item, I am removing column "item:xx" as well

question is how to avoid "empty" keywords such if I remove nokia item "no":


    keywords{
        ...
        NOKIA     { webpage => www.domain.com/nokia }
        phone     { webpage => www.domain.com/phone }
    }

I can count slice item:*, but because of eventual consistency this will be probably wrong aproach.

what about if I denormalize the "webpage" column for each item:xx column? — Nick, May 31 '12 at 18:40

Wildfire · Answer 1 · 2012-06-01T06:56:56.777

1

You can add a CounterColumn (http://wiki.apache.org/cassandra/Counters) to keywords CF. Increment it when adding an item to the keyword, and decrement on removal:

keywords{
    computer  { webpage => www.domain.com/computer , count => 2 , item:dl => dl , item:hp => hp }
    ....
}

When reading a row with count == 0, just treat it as deleted. You shouldn't actually delete the 'webpage' column if you read the row with count == 0, since there might be concurrent add operation.

edited Jun 01 '12 at 06:56

answered Jun 01 '12 at 06:49

Wildfire

6,358
2
34
50

Unfortunately counter columns can only be added in a special column family of CounterColumnType. This approach in general could work but you would need to look up the count in a separate counter cf created for this purpose. – nickmbailey Jun 01 '12 at 07:44

score 0 · Accepted Answer · answered Jun 01 '12 at 07:19

this is interesting, but I though about other way - to denormalize the "webpage" thing, e.g.:

[code]

keywords{
    computer  { webpage:dl => www.domain.com/computer , item:dl => dl ,
            webpage:dl => www.domain.com/computer ,  item:hp => hp }
    DELL      { webpage:dl => www.domain.com/dell ,     item:dl => dl }
    topseller { webpage:dl => www.domain.com/top ,      item:dl => dl }
    HP        { webpage:hp => www.domain.com/hp ,       item:hp => hp }
    NOKIA     { webpage:no => www.domain.com/nokia ,    item:no => no }
    phone     { webpage:no => www.domain.com/phone ,    item:no => no }
}

[/code]

in such case when i delete item:xx, i delete webpage:xx as well, and row is auto-removed (ghost) if there is no fields there. However I am still not sure if this is such a bright idea.

Apache Cassandra schema design

2 Answers2

Linked