Separate a character column and convert to tidy format

Question

I have a dataset, in which hashtags columns contains variable number of hashtags separated by ;. For example:

id  hashtags
1   #apple;#peach           
2   #apple          
3   #pear;#orange;#banana

What I need to do is separate the column and convert it to tidy data. So each row contains id and one single hashtag.

id  hashtag
1   #apple          
1   #peach          
2   #apple          
3   #pear           
3   #orange         
3   #banana

I can do that with loop like below.

library(tidyverse)
df = data.frame(id=1:3, hashtags=c("#apple;#peach", "#apple", "#pear;#orange;#banana"))

dat = data.frame()
for (i in 1:nrow(df)) {
  dt = data.frame(id = df$id[i], 
                  hashtag = str_split(df$hashtags[i], ';')[[1]])
  dat = bind_rows(dat, dt)
}

But I think the way above is not a good solution. I found it's extremely slow when I have a very large dataset (my actual data).

Does anyone have a better and much quicker way to achieve that? Note that the number of hashtags for each id varies a lot. tidyr::separate does not seems to apply here.

Look into `separate_rows` – arg0naut91 Aug 06 '19 at 15:47 — arg0naut91, Aug 06 '19 at 15:47

score 5 · Answer 1 · answered Aug 06 '19 at 15:48

An option is separate_rows and specify the sep

library(tidyverse)
df %>% 
   separate_rows(hashtags, sep=";")
#. id hashtags
#1  1   #apple
#2  1   #peach
#3  2   #apple
#4  3    #pear
#5  3  #orange
#6  3  #banana

If we don't specify the sep, it will automatically select all the possible delimiters found in the data

Separate a character column and convert to tidy format

1 Answers1