0

I want to count unique values of categorical variable based of a column based on Date.

I want result as a matrix where column names are the values categorical variable, row names will be unique Date values and their cell values is the unique count.

The below links solves the group by problem but I am looking for the transformed df:

How to add count of unique values by group to R data.frame

R: Extract unique values in one column grouped by values in another column

My df has more than 50,000 rows and looks like:

dat <- data.frame(Date = c('06/08/2018','06/08/2018','07/08/2018','07/08/2018','08/08/2018','09/08/2018','09/08/2018','11/08/2018','11/08/2018','13/08/2018'),
                  Type= c('A','B','C','A','B','A','A','B','C','C'))

I want my resultant matrix to have "A", "B" ,"C" as new columns, "Date" as the rows and values in matrix as the unique count, shown in below image:

Resultant Matrix

Also, it would be great that we don't hardcode categorical values. So, in future if instead of 3 it becomes 4, then code automatically handles it.

Community
  • 1
  • 1
Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51

3 Answers3

2

How about using table...

mat <- table(dat$Date, dat$Type)

mat

             A B C
  06/08/2018 1 1 0
  07/08/2018 1 0 1
  08/08/2018 0 1 0
  09/08/2018 2 0 0
  11/08/2018 0 1 1
  13/08/2018 0 0 1
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
1

What you're looking for is dcast():

dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")

This function will quickly aggregate your data based upon the fun.aggregate argument (in your case length().

C-x C-c
  • 1,261
  • 8
  • 20
  • That won't work in this case. Do you want to take a mean of the subsequent columns? I.e. the mean of A, B, and C? – C-x C-c Jun 07 '18 at 19:44
1

This uses spread

library(tidyverse)

spread_data <- (data, key = type, value = 2)
Ben G
  • 4,148
  • 2
  • 22
  • 42