0

I have a column in a dataset that is fairly messy. It is a column called: Themes, in a dataset of different projects.

There are in total 10 different themes, all jumbled up in a random order Image:My messy column

What I am trying to do, is create a column for each theme, where either 0 (= project does not include a theme) or 1 (= project does include a theme), is placed as to whether the specific project/row contains the theme.

fx. please see image:My wished output

What I have tried to do is use separate:

Themes_split <- Merge %>%
               separate(Themes, into = c("CP", "CG", 
                      "Edu", "Health", "Nut", "Ill", "Liv", "Hum", 
                      "Cross-Thematic", "Non-Thematic"), sep = ",", 
                       na.rm=TRUE)

But the output of this does not recognize the random order of how the themes are listed in the column, please see image of output: My actual output

How do I make R recognize the different themes and assign the value 0 or 1 to the new columns for when the specific project contains the different themes??

I am looking forward to a bit of help - Thank you

mukund
  • 553
  • 3
  • 15
BloopFloopy
  • 139
  • 1
  • 2
  • 12

1 Answers1

0

You haven't given us your themes data . However have a look at the tidyr::separate_rows help example. You can adapt that to your example:

df <- data.frame(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  z = c("1", "2,3,4", "5,6"),
  stringsAsFactors = FALSE
)

> df
  x     y     z
1 1     a     1
2 2 d,e,f 2,3,4
3 3   g,h   5,6

separate_rows(df, y, z, convert = TRUE)

  x y z
1 1 a 1
2 2 d 2
3 2 e 3
4 2 f 4
5 3 g 5
6 3 h 6

You can amend that by further spreading the data:

separate_rows(df, y, z, convert = TRUE) %>% spread(y, z, fill= 0)
  x a d e f g h
1 1 1 0 0 0 0 0
2 2 0 2 3 4 0 0
3 3 0 0 0 0 5 6
Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33