1

When trying to create dummy variables there is an issue where a row is being merged for some reason. This results in a row being associated with two groups, hence not a dummy variable. The following code reproduces the issue.

df = data.frame(group = c(4, 2, 3, 3, 4, 4), time = c(0.1, 0.2, 0.3, 0.3, 0.3, 0.4), age = c(65, 86, 49, 71, 71, 76), year = c(72, 74, 72, 76, 76, 77), death = c(1, 1, 1, 1, 1, 1))

df %>% mutate(i=1) %>% spread(group, i, fill=0)

You can see after running the code that there is a row that is merged resulting in a subject that is in two groups at once. Is this an error with the code or an error with the function?

Brian
  • 7,900
  • 1
  • 27
  • 41

1 Answers1

1

We can create a column with row_number() because there are some duplicate rows

library(dplyr)
library(tidyr)
df %>% 
    mutate(i=1, rn = row_number()) %>% 
    spread(group, i, fill=0) %>%
    select(-rn)

Or using pivot_wider

df %>%
   mutate(rn = row_number(), i = 1) %>%
   pivot_wider(names_from = group, values_from = i, values_fill = list(i = 0))
akrun
  • 874,273
  • 37
  • 540
  • 662