Adding multiple columns in a dplyr mutate call

Question

I have a data frame with a dot-separated character column:

> set.seed(310366)
> tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep=""))
> tst
    x       y
1   1 BAR.baz
2   2 FOO.foo
3   3 BAZ.baz
4   4 BAZ.foo
5   5 BAZ.bar
6   6 FOO.baz
7   7 BAR.bar
8   8 BAZ.baz

and I want to split that column into two new columns containing the parts on either side of the dot. str_split_fixed from package stringr can do the job quite nicely. All my values are definitely two parts separated by a dot so I can do:

> require(stringr)
> str_split_fixed(tst$y,"\\.",2)
      [,1]  [,2] 
 [1,] "BAR" "baz"
 [2,] "FOO" "foo"
 [3,] "BAZ" "baz"
 [4,] "BAZ" "foo"
 [5,] "BAZ" "bar"
 [6,] "FOO" "baz"
 [7,] "BAR" "bar"

Now I could just cbind that to my data frame but I thought I'd figure out how to do that in a dplyr pipeline. First I thought mutate could do it in one:

> tst %.% mutate(parts=str_split_fixed(y,"\\.",2))
Error: wrong result size (20), expected 10 or 1

I can get mutate to do it in two:

> tst %.% mutate(part1=str_split_fixed(y,"\\.",2)[,1], part2=str_split_fixed(y,"\\.",2)[,2])
    x       y part1 part2
1   1 BAR.baz   BAR   baz
2   2 FOO.foo   FOO   foo
3   3 BAZ.baz   BAZ   baz
4   4 BAZ.foo   BAZ   foo
5   5 BAZ.bar   BAZ   bar
6   6 FOO.baz   FOO   baz

but that's running the string split twice.

"Best" I can do so far in a dplyr way is this (which I only discovered while writing this question...):

> tst %.% do(cbind(.,data.frame(parts=str_split_fixed(.$y,"\\.",2))))
    x       y parts.1 parts.2
1   1 BAR.baz     BAR     baz
2   2 FOO.foo     FOO     foo
3   3 BAZ.baz     BAZ     baz
4   4 BAZ.foo     BAZ     foo
5   5 BAZ.bar     BAZ     bar

which isn't bad, but loses a lot of the readability of piped things in R. Is there a simple approach using mutate that I've missed?

score 42 · Accepted Answer · edited Jul 08 '15 at 16:44

42

You can use separate() from tidyr in combination with dplyr:

tst %>% separate(y, c("y1", "y2"), sep = "\\.", remove=FALSE)

    x       y  y1  y2
1   1 BAR.baz BAR baz
2   2 FOO.foo FOO foo
3   3 BAZ.baz BAZ baz
4   4 BAZ.foo BAZ foo
5   5 BAZ.bar BAZ bar
6   6 FOO.baz FOO baz
7   7 BAR.bar BAR bar
8   8 BAZ.baz BAZ baz
9   9 FOO.bar FOO bar
10 10 BAR.foo BAR foo

Setting remove=TRUE will remove column y

edited Jul 08 '15 at 16:44

zx8754

52,746
12
114
209

answered Jul 24 '14 at 14:41

erc

10,113
11
57
88

1

Unfortunately `separate` gives an error when the result after splitting has more columns than specified. In this sense it works rather like `str_split` than `str_split_fixed`. See [this feature request](https://github.com/hadley/tidyr/issues/19). But still a very helpful answer, thanks. – Beasterfield Aug 14 '14 at 11:47
2

using `extra = "merge"` parameter this can be controlled – JelenaČuklina Jan 21 '16 at 15:08
1

This is a great answer! Thanks – Huanfa Chen Jul 26 '17 at 13:56
Note that by default anything not alphanumeric will be used as a separator, so setting `sep` isnot necessary in this case. – moodymudskipper Feb 18 '19 at 21:25

DomQ · Answer 2 · 2020-10-01T21:19:50.160

This answer applies here as well; the following approach is both tidyverse-idiomatic and more performant than separate() (as of 2020):

set.seed(310366)
tst = data.frame(x=1:10,y=paste(sample(c("FOO","BAR","BAZ"),10,TRUE),".",sample(c("foo","bar","baz"),10,TRUE),sep=""))

library(dplyr)
library(purrr)

tst %>% 
  mutate(tmp_chunks = stringr::str_split(y, fixed("."),  n = 2)) %>%
  mutate(y1 = map_chr(tmp_chunks, 1),
         y2 = map_chr(tmp_chunks, 2)) %>%
  select(-tmp_chunks)

... Or if you don't want y anymore after splitting it, you can change the last line to

  select(-tmp_chunks, -y)

Adding multiple columns in a dplyr mutate call

2 Answers2

Linked

Related