Your approach is a good start!
my.DT <- myDT[,.(Type=Type, number_of_offices=nrow(my.DT[my.DT$Type=="office",]), number_of_apartments=nrow(my.DT$Type=="apparment",], by="Property")
The brackets don't match, I assume there was some copy-paste error. These can efficiently avoided by following the guidelines on how to create a great reproducible example & reprex
package. The code in your question could then look something like this:
library(data.table)
#> Warning: package 'data.table' was built under R version 3.6.2
rooms_dt <- fread(text =
"Property Type
1 apartment
1 office
2 office
2 office
3 apartment
3 apartment
3 office
")
rooms_dt[, .(
Type=Type,
number_of_offices=nrow(rooms_dt[rooms_dt$Type=="office",]),
number_of_apartments=nrow(rooms_dt[rooms_dt$Type=="apartment",])
), by="Property"]
#> Property Type number_of_offices number_of_apartments
#> 1: 1 apartment 4 3
#> 2: 1 office 4 3
#> 3: 2 office 4 3
#> 4: 2 office 4 3
#> 5: 3 apartment 4 3
#> 6: 3 apartment 4 3
#> 7: 3 office 4 3
Created on 2020-03-12 by the reprex package (v0.3.0)
As you noticed, the problem with that approach is that the j
expression that gets evaluated for every group (= every Property
) the numbers of all apartment/office rows are counted not just the one of the current Property
. This could be avoided by subsetting not only for the correct type but also the correct property (available in j
as the special variable .BY
). This already yields the desired result:
rooms_dt[, .(
Type=Type,
number_of_offices=nrow(rooms_dt[rooms_dt$Property == .BY & rooms_dt$Type=="office",]),
number_of_apartments=nrow(rooms_dt[rooms_dt$Property == .BY & rooms_dt$Type=="apartment",])
), by="Property"]
#> Property Type number_of_offices number_of_apartments
#> 1: 1 apartment 1 1
#> 2: 1 office 1 1
#> 3: 2 office 2 0
#> 4: 2 office 2 0
#> 5: 3 apartment 1 2
#> 6: 3 apartment 1 2
#> 7: 3 office 1 2
However, this solution can be simplified quite a bit:
First and foremost, the columns of the data.table
can be accessed directly by their name within []
:
rooms_dt[, .(
Type=Type,
number_of_offices=nrow(rooms_dt[Property == .BY & Type=="office",]),
number_of_apartments=nrow(rooms_dt[Property == .BY & Type=="apartment",])
), by="Property"]
Furthermore, data.table
provides the current data.table subsetted [sic] to the current group as .SD
, which allows to simplify this further:
rooms_dt[, .(
Type=Type,
number_of_offices=nrow(.SD[Type=="office",]),
number_of_apartments=nrow(.SD[Type=="apartment",])
), by="Property"]
Since not only .SD is subsetted, but all columns are, it is even simpler to just work with them directly:
rooms_dt[, .(
Type = Type,
number_of_offices = sum(Type=="office"),
number_of_apartments = sum(Type=="apartment")
), by="Property"]
One of the strengths of data.table
is that in contrast to most R
stuff is, that they are mutable. This allows to just add the new columns directly to the old data.table without creating a new one:
rooms_dt[, number_of_offices := sum(Type=="office"), by="Property"]
rooms_dt[, number_of_apartments := sum(Type=="apartment"), by="Property"]
This culd also be written as:
rooms_dt[, c("number_of_offices", "number_of_apartments") := list(sum(Type=="office"), sum(Type=="apartment")), by="Property"]
or as :
rooms_dt[, `:=`(number_of_offices = sum(Type=="office"), number_of_apartments = sum(Type=="apartment")), by="Property"]
This was not as easy as it could be, probably because it is rearly needed, often it is desirable to have one of these instead:
rooms_dt[, count_in_property := .N, by=.(Property, Type)]
rooms_dt
#> Property Type number_in_property
#> 1: 1 apartment 1
#> 2: 1 office 1
#> 3: 2 office 2
#> 4: 2 office 2
#> 5: 3 apartment 2
#> 6: 3 apartment 2
#> 7: 3 office 1
or
dcast(rooms_dt[, .(number_of = .N), by=.(Property, Type)], Property ~ Type, fill = 0)
#> Using 'number_of' as value column. Use 'value.var' to override
#> Property apartment office
#> 1: 1 1 1
#> 2: 2 0 2
#> 3: 3 2 1