Transform DF Column Values to Matrix in R

Question

I want to count unique values of categorical variable based of a column based on Date.

I want result as a matrix where column names are the values categorical variable, row names will be unique Date values and their cell values is the unique count.

The below links solves the group by problem but I am looking for the transformed df:

How to add count of unique values by group to R data.frame

R: Extract unique values in one column grouped by values in another column

My df has more than 50,000 rows and looks like:

dat <- data.frame(Date = c('06/08/2018','06/08/2018','07/08/2018','07/08/2018','08/08/2018','09/08/2018','09/08/2018','11/08/2018','11/08/2018','13/08/2018'),
                  Type= c('A','B','C','A','B','A','A','B','C','C'))

I want my resultant matrix to have "A", "B" ,"C" as new columns, "Date" as the rows and values in matrix as the unique count, shown in below image:

Also, it would be great that we don't hardcode categorical values. So, in future if instead of 3 it becomes 4, then code automatically handles it.

have you tried using `spread` from the `tidyr` package? – Ben G Jun 07 '18 at 19:32 — Ben G, Jun 07 '18 at 19:32
Never tried these packages. Not an expert in R – Rahul Agarwal Jun 07 '18 at 19:33 — Rahul Agarwal, Jun 07 '18 at 19:33

score 2 · Answer 1 · answered Jun 07 '18 at 19:37

2

How about using table...

mat <- table(dat$Date, dat$Type)

mat

             A B C
  06/08/2018 1 1 0
  07/08/2018 1 0 1
  08/08/2018 0 1 0
  09/08/2018 2 0 0
  11/08/2018 0 1 1
  13/08/2018 0 0 1

answered Jun 07 '18 at 19:37

Andrew Gustar

17,295
1
22
32

score 1 · Accepted Answer · answered Jun 07 '18 at 19:33

1

What you're looking for is dcast():

dcast(dat, Date ~ Type, fun.aggregate = length, value.var = "Type")

This function will quickly aggregate your data based upon the fun.aggregate argument (in your case length().

answered Jun 07 '18 at 19:33

C-x C-c

1,261
8
20

That won't work in this case. Do you want to take a mean of the subsequent columns? I.e. the mean of A, B, and C? – C-x C-c Jun 07 '18 at 19:44

score 1 · Answer 3 · answered Jun 07 '18 at 20:01

1

This uses spread

library(tidyverse)

spread_data <- (data, key = type, value = 2)

answered Jun 07 '18 at 20:01

Ben G

4,148
2
22
42

Transform DF Column Values to Matrix in R

3 Answers3