-2

I have a columns within my tibble called sourceMedium with strings like so:

"apples / pears"

I want to mutate this into two new columns and then to remove the original one. I'm trying to do this within a dplyr chain of operations and I just can't get it:

wrangled <- gaDataSessionsAggregate %>%
+ mutate(source = unlist(strsplit(sourceMedium, "/"))[1],
         medium = unlist(strsplit(sourceMedium, "/"))[2])

This runs but I only get one unique value in each of the two new fields. There should be many unique values within each new field based on the original column. It looks like r is keeping the first value in the tibble and applying it to every other instance in the columns.

What is the shortest, most "dplyr esque" way to take field sourceMedium and split into two new fields "source" and "medium" based on a slash separator "/"?

Doug Fir
  • 19,971
  • 47
  • 169
  • 299

1 Answers1

0

Dplyr doesn't handle elements of list columns in the same way as it does vector columns. So pass dplyr::rowwise() before you mutate/unlist:

library(dplyr)
library(stringr)

orig <- tibble(sourceMedium = c('apples / pears', 'red / blue', 'green / grey',
                                'wet / dry', 'ear / nose', 'mac / linux'))

wrangled <- orig %>%
    dplyr::mutate(tempcol = stringr::str_split(sourceMedium, ' / ')) %>%
    dplyr::rowwise() %>%
    dplyr::mutate(source = unlist(tempcol)[1], medium = unlist(tempcol)[2]) %>%
    dplyr::select(-tempcol)
wrangled

Gives the following output:

Source: local data frame [6 x 3]
Groups: <by row>

# A tibble: 6 × 3
    sourceMedium source medium
           <chr>  <chr>  <chr>
1 apples / pears apples  pears
2     red / blue    red   blue
3   green / grey  green   grey
4      wet / dry    wet    dry
5     ear / nose    ear   nose
6    mac / linux    mac  linux
> 
allen
  • 16
  • 3