2

I want to count the amount of different values in a combination of columns in a data.table within a certain group.

Simple example: the data looks like

      room | object
-----------------------
   kitchen | dishwasher
   kitchen | oven
livingRoom | sofa

Now I want to know: How many different objects are in every room? The answer is simple:

library(data.table)
dt = data.table(room = c("kitchen", "kitchen", "livingRoom"), object = c("dishwasher", "oven", "sofa"))
dt[, .(amount = uniqueN(object)), by=room]

However, the situation is slightly more tricky if the object is described by multiple columns. For example: The objects have a color and the data looks like this:

      room | object     | color
-------------------------------
   kitchen | dishwasher | white
   kitchen | oven       | white
livingRoom | toy        | red
livingRoom | toy        | red
livingRoom | toy        | green

Now I want to know: How many distinct object-color combinations are there is every room? I.e. the answer I want is:

      room | amount
-------------------
   kitchen | 2
livingRoom | 2

I tried to do the natural thing: just write more columns in uniqueN but it did not work:

dt = data.table(room = c("kitchen", "kitchen", "livingRoom", "livingRoom", "livingRoom")
                ,object = c("dishwasher", "oven", "toy", "toy", "toy")
                ,color = c("white", "white", "red", "red", "green"))
dt[, .(amount = uniqueN(object, color)), by=room] # error
dt[, .(amount = uniqueN(.(object, color))), by=room] # error

Surely I could just put 'object' and 'color' together into one column and then just use uniqueN on the single combination column but that is the poor mans choice for something I am sure is there but I can't figure out...

Anyone knows how to do this?

THX

Fabian Werner
  • 957
  • 11
  • 19
  • It seems that you're not asking "how many distinct object-color combinations are in each room?" but how often does a certain object-color combination appear in a room. Otherwise toy-green would also be 2 in the livingroom – talat Dec 29 '17 at 08:50
  • @docendodiscimus: You were absolutely right, I was confused and asked for the wrong example – Fabian Werner Dec 29 '17 at 09:03

1 Answers1

1

According to ?uniqueN, it takes a vector or data.frame/data.table object as input.

x -A data.table. uniqueN accepts atomic vectors and data.frames as well.

So, after grouping by 'room', apply the uniqueN on the Subset of data.table (.SD)

dt[, .(amount = uniqueN(.SD)), by = room]
#         room amount
#1:    kitchen      2
#2: livingRoom      2

If there are more number of columns, then specify the column of interest in .SDcols with either the column index

dt[, .(amount = uniqueN(.SD)), by = room, .SDcols = 2:3]

or column name

dt[, .(amount = uniqueN(.SD)), by = room, .SDcols = c("object", "color")]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I am confused... maybe it was a silly question... but why did they come up with the uniqueN function in the first place if I can just reduce every such problem to a "count all in a group by" problem? I.e. is it always true that dt[, .(amount = .N), by=] === dt[, .(amount = uniqueN(col1)), by=]? – Fabian Werner Dec 29 '17 at 08:55
  • Sorry, I was ultra confused and asked the wrong question... I'll edit – Fabian Werner Dec 29 '17 at 09:01
  • @FabianWerner Thanks, could you with edit with a new example as I don't find `uniqueN` the right tool here – akrun Dec 29 '17 at 09:02
  • I want to have the one-step version of `unique(dt[, .(room, object, color)])[, .(amount = .N), by=room]` :-) – Fabian Werner Dec 29 '17 at 09:05
  • @FabianWerner Your code gives `2 2` which is what the `uniqueN` gives you as output – akrun Dec 29 '17 at 09:06
  • Indeed, I want this `2 2`... As I said: I was a little confused. How to get that `2 2` using uniqueN? This 'paste' has an implicit cast to char in it, right: As I said, I do not want to do this, I want to cound natively... – Fabian Werner Dec 29 '17 at 09:08
  • That only solves the thing in the special case that there are no other attributes. If you add `dt = dt[, ROW_NUMBER := .I]` then your answer is not correct... – Fabian Werner Dec 29 '17 at 09:11
  • @FabianWerner Updated the post – akrun Dec 29 '17 at 09:13
  • 1
    Ah yes, the final one was the thing I was looking for... sorry for the weird description, I'll accept :-D – Fabian Werner Dec 29 '17 at 09:16