1

I have a data that looks like this: Sort of like a standard data format

How do I convert it to this format in RStudio? Wide Format

Sorry for the pictures as I don't know how to create tables here.

For each column, there are many other values. And for each value, I would like to transform it into a column. (eg. status columns can have 'Divorced', 'Widowed' etc.)

Nakx
  • 1,460
  • 1
  • 23
  • 32
Shu Ang
  • 11
  • 1

2 Answers2

1

Assume you have named the described table as df. Apply following operations to get the desired output.

df <- data.frame(ID = 1:4,Status = c("Single", "Single", "Married","Married"),
                 Gender = c("M", "F", "F","F"),Age_Group =c("2","3","2","2"))


knitr::kable(df)

df$Age_Group=as.character(df$Age_Group)

df1 <- fastDummies::dummy_cols(df)

Delete the columns if dummies are made.

df1=df1[,-c(2,3,4)]

View(df1)
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
Yogesh
  • 1,384
  • 1
  • 12
  • 16
0

Using the sample data set @Yogesh created, you can use the following code to reshape the data. I used the melt function from the reshape package and spread function from tidyr package.

df <- data.frame(ID = 1:4,Status = c("Single", "Single", "Married","Married"),
             Gender = c("M", "F", "F","F"),Age_Group =c("2","3","2","2"))
df2 <- reshape::melt(df, id = c("ID"))
df2$new_col <- paste0(df2$variable, "-", df2$value)
df2 <- df2[, !(names(df2) %in% c("variable", "value"))]
df3 <- as.data.frame(table(df2$ID, df2$new_col))
df4 <- tidyr::spread(x, Var2, Freq)
colnames(df4)[1] <- "ID"
df4

This is the output the above code generated -

  ID Age_Group-2 Age_Group-3 Gender-F Gender-M Status-Married Status-Single
1  1           1           0        0        1              0             1
2  2           0           1        1        0              0             1
3  3           1           0        1        0              1             0
4  4           1           0        1        0              1             0

Hope this helps!

RD_
  • 315
  • 2
  • 5