Many Hot encoder in R

Question

I have a dataframe such as below, and i want to many hot this based on week,

this is my ideal output:

id    week1   week2    week3    week4    week5
345    1        2       0         0        1
351    0        1       0         0        0
222    0        0       0         1        0
264    0        0       1         0        1
277    0        1       0         0        0 
...

and my idea for this problem was based on combining one hot encoded of this dataframe, but it was very complex,

any body knows can i get this output in R?

Esben Eickhardt · Answer 1 · 2018-10-06T09:05:25.820

0

I am sure this can be done much more elegantly, but this gets the job done.

# Libraries
library(dplyr)
library(tidyr)

# Dataframe
data <- "id    week
345    1
351    2
222    4
264    3
345    5
277    2
345    2
345    2
264    5"
df <- read.table(text = data, header = TRUE)

# All at once
df <- df %>% 
  group_by(id, week) %>% 
  summarise(count = n()) %>%
  mutate(week = paste0("week", week)) %>%
  spread(week, count)

# Setting NA to zero
df[is.na(df)] <- 0

edited Oct 06 '18 at 09:05

answered Oct 06 '18 at 08:57

Esben Eickhardt

3,183
2
35
56

Thanks @Esben Eickhardt, but it returns all ids , i want just unique ids at the end. – AHAD Oct 06 '18 at 09:22
Oh well, someone else almost copy pasted my answer, and gets it :P I am glad you could use his answer. – Esben Eickhardt Oct 06 '18 at 10:35

score 0 · Accepted Answer · answered Oct 06 '18 at 09:23

Using tidyverse:

df %>%
  mutate(week = paste("week", week, sep = "")) %>%
  group_by(id, week) %>% 
  summarise(n = n()) %>%
  ungroup() %>%
  spread(key = week, value = n) %>% 
  mutate_all(funs(replace(., is.na(.), 0)))

# A tibble: 5 x 6
     id week1 week2 week3 week4 week5
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  222.    0.    0.    0.    1.    0.
2  264.    0.    0.    1.    0.    1.
3  277.    0.    1.    0.    0.    0.
4  345.    1.    2.    0.    0.    1.
5  351.    0.    1.    0.    0.    0.

Many Hot encoder in R

2 Answers2