-2

Before

+---------+------------------------------------+
|  Word   |                Tags                |
+---------+------------------------------------+
| morning | #sunrise #droplets #waterdroplets  |
| morning | #sky #ocean #droplets              |
+---------+------------------------------------+

After

+---------+---------------+
|  Word   |     Tags      |
+---------+---------------+
| morning | sunrise       |
| morning | droplets      |
| morning | waterdroplets |
| morning | sky           |
| morning | ocean         |
| morning | droplets      |
+---------+---------------+

Notice how I want to keep droplets appearing twice. This table is very big, over 5m rows, if this method can be efficient that would be very helpful. Thanks!

Ian
  • 9
  • 5
    [Split comma-separated strings in a column into separate rows](https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows). Adjust separator and `sub` away the `#`. – Henrik Feb 23 '19 at 19:42
  • 1
    If you need more help than Henrik's recommendation, you'll get it much faster if you give your sample input in R syntax. `dput()` is nice for making copy-pasteable R objects (if your `before` data frame is in R, then `dput(before[1:2, ])` will give a copy/pasteable version of the top 2 rows. If that looks really long and you have factors, try `dput(droplevels(before[1:2, ]))`. – Gregor Thomas Feb 23 '19 at 19:46

1 Answers1

2

We can use separate_rows from tidyr.

library(dplyr)
library(tidyr)

dat <- tribble(
  ~Word,   ~Tags,
  "morning", "#sunrise #droplets #waterdroplets",
  "morning", "#sky #ocean #droplets"
)

dat2 <- dat %>%
  separate_rows(Tags, sep = " #") %>%
  mutate(Tags = gsub("#", "", Tags))
dat2
# # A tibble: 6 x 2
#   Word    Tags         
#   <chr>   <chr>        
# 1 morning sunrise      
# 2 morning droplets     
# 3 morning waterdroplets
# 4 morning sky          
# 5 morning ocean        
# 6 morning droplets   
www
  • 38,575
  • 12
  • 48
  • 84