Creating a New Column of True/False in R

Question

I am trying to create a new column in my dataset. So far, I have imported a JSON file into R with a column full of different words ("purple", "red", "blue", etc.) each observation has some combination of these words. My goal is to create a new column with the header of the words apparent ("purple", "red", "blue", etc.). I want that column to have True or Falses depending on whether the observations exhibit that color. I tried to use the subset function as well as manually doing so but there are over 300 different observations, making that very inconvenient. I greatly appreciate any help!

For example:

Observations     Color
1                Blue
2                Red, Blue
3                Blue, Green
4                Purple
5                Yellow, Orange

and now I want

Observations     Color       Red       Yellow        Orange    Blue
1                Blue        False     False         False     True
2                Red, Blue   True      False         False     True

etc

This is my first question on this site so i apologize if there are any problems.

Please see ["How to make a great R reproducible example?"](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It will help people reply to your question in a more useful manner. Also I'm pretty sure this has been asked a few times already... — nico, Jun 30 '15 at 12:43
For instance: http://stackoverflow.com/questions/7658316/create-new-column-based-on-4-values-in-another-column and http://stackoverflow.com/questions/26278127/create-new-column-based-on-values-in-2-other-columns (your solution may be easier but the idea is the same). — nico, Jun 30 '15 at 12:44
possible duplicate of [Split string in rows to create new columns](http://stackoverflow.com/questions/27630588/split-string-in-rows-to-create-new-columns) — user20650, Jun 30 '15 at 12:57
From duplicate... try `splitstackshape::cSplit_e(dat, "Color", ",", type = "character", fill = 0)` — user20650, Jun 30 '15 at 12:59

Molx · Answer 1 · 2015-06-30T13:33:18.550

You can simply iterate over the column names you want to create and use grepl to find if they are present in the Color column:

dat <- read.table(text="Observations     Color
1                Blue
                  2                Red,Blue
                  3                Blue,Green
                  4                Purple
                  5                Yellow,Orange", header=T, stringsAsFactors=F)
# I removed the space after the commas to facilitate the data.frame creation.

cols <- c("Red", "Yellow", "Orange", "Blue")

for (i in cols) dat[[i]] <- grepl(i, dat$Color)

Result:

> dat
  Observations         Color   Red Yellow Orange  Blue
1            1          Blue FALSE  FALSE  FALSE  TRUE
2            2      Red,Blue  TRUE  FALSE  FALSE  TRUE
3            3    Blue,Green FALSE  FALSE  FALSE  TRUE
4            4        Purple FALSE  FALSE  FALSE FALSE
5            5 Yellow,Orange FALSE   TRUE   TRUE FALSE

Edit:

If you want columns for all colors, a better way to create the vector is as proposed by Robert in the comments:

cols <- unique(unlist(strsplit(dat$Color, ",")))
#You might have to change from "," to ", " if you have white spaces after the commas
#or even ",\\s?" if they aren't always there.

The new result will be:

  Observations         Color   Red Yellow Orange  Blue Green Purple
1            1          Blue FALSE  FALSE  FALSE  TRUE FALSE  FALSE
2            2      Red,Blue  TRUE  FALSE  FALSE  TRUE FALSE  FALSE
3            3    Blue,Green FALSE  FALSE  FALSE  TRUE  TRUE  FALSE
4            4        Purple FALSE  FALSE  FALSE FALSE FALSE   TRUE
5            5 Yellow,Orange FALSE   TRUE   TRUE FALSE FALSE  FALSE

Udpate colors: `cols <- unique(unlist(strsplit(dat$Color, ",", fixed = TRUE)))` — Robert, Jun 30 '15 at 13:03
@Robert I didn't notice he said he wanted columns for every color, thanks. — Molx, Jun 30 '15 at 13:34

Robert · Answer 2 · 2015-06-30T13:15:39.917

Try something like this:

example <- data.frame(colors=c("A,B", "A", "B", "F", "C", "C,G", "C", "D", "E", "F"),stringsAsFactors = F)
cols <- sort(unique(unlist(strsplit(example$colors, ",", fixed = TRUE))))
dummies= sapply(cols,function(co)grepl(co, example$colors))

          A     B     C     D     E     F     G
 [1,]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
 [2,]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
 [3,] FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
 [4,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
 [5,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [6,] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
 [7,] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
 [9,] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
[10,] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE

Nick Kennedy · Answer 3 · 2015-06-30T15:57:53.693

The base R explanation is much simpler than I can achieve with dplyr, but for interest here's a dplyr solution:

cols <- unique(unlist(strsplit(dat$Color, ",", fixed = TRUE)))
dat %>% mutate_(.dots = sapply(cols, function(col) interp(~grepl(col, Color), col = col)))

And here's a way using plyr and magrittr:

cols %>% 
  laply(grepl, dat$Color) %>%
  t %>%
  data.frame %>%
  setNames(cols) %>%
  cbind(dat, .)

And another:

dat %>% adply(1, . %$%
                Color %>%
                strsplit(",") %>%
                extract2(1) %>%
                factor(levels = cols) %>%
                table %>%
                is_greater_than(0))

This makes use of the fact that magrittr lets you create an anonymous function chain.

Creating a New Column of True/False in R

3 Answers3

Edit: