0

In short I am lookinkg for a way to calculate the unique number of forecasts from analysts for a company for a given year.

Input

dt <- data.table(CUSIP = c(1,1,1,1,2,2,2,2,2,2), Fdate = c("2000-12-31","2000-12-31","2001-12-31","2001-12-31","2000-12-31","2000-12-31","2000-12-31", "2001-12-31", "2001-12-31", "2001-12-31"), Analys = c(7649, 9845, 44283, 7649, 9845, 37856, 63528, 9845, 88254, 88254))

It looks like a question already asked Count unique values of a column by pairwise combinations of another column in R however I can't get it to work.

With the described solution of that question, the result is not what I am looking for. As it counts and presents the combinations of the fdate and analysts. However, I want the result of the number of unique analyst for a given company and forecast year.

Result from solution count unique values of a column by pairwise combinations of another column in r:

dt.cj <- merge(dt, dt, by ="Fdate", all = T, allow.cartesian = T)
dt.res <- dt.cj[Analys.x < Analys.y, .(cnt = length(unique(CUSIP.x))),by = .(Fdate, Analys.x, Analys.y)]

CUSIP = Unique ID for a company. Fdate = The date the forecast is based for. Analys = The ID of a analyst.

Desired output:

output <- data.table(Reg.ID = c(1,1,2,2), Location = c("2000-12-31","2001-12-31"), Count = c(2,2,3,2))
Patrick
  • 106
  • 1
  • 7

1 Answers1

1
dt[, .(Count = uniqueN(Analys)), by = .(CUSIP, Fdate)]
#    CUSIP      Fdate Count
# 1:     1 2000-12-31     2
# 2:     1 2001-12-31     2
# 3:     2 2000-12-31     3
# 4:     2 2001-12-31     2

The example you linked in the question was overly complicated because it used pairwise combinations of a single column --- it had to match up a column with itself in every possible way. You want unique observations by group, and it happens that your group is defined by 2 columns. It's a much simpler problem.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    You could also use `uniqueN` instead of `length(unique(...` – markus Jun 18 '19 at 20:22
  • Thank you very much for your solution. That is exactly what I am looking for! @Markus the time for running the code with uniqueN is larger, than without. – Patrick Jun 18 '19 at 20:22