I have a dataset, in which hashtags columns contains variable number of hashtags separated by ;
. For example:
id hashtags
1 #apple;#peach
2 #apple
3 #pear;#orange;#banana
What I need to do is separate the column and convert it to tidy data. So each row contains id and one single hashtag.
id hashtag
1 #apple
1 #peach
2 #apple
3 #pear
3 #orange
3 #banana
I can do that with loop like below.
library(tidyverse)
df = data.frame(id=1:3, hashtags=c("#apple;#peach", "#apple", "#pear;#orange;#banana"))
dat = data.frame()
for (i in 1:nrow(df)) {
dt = data.frame(id = df$id[i],
hashtag = str_split(df$hashtags[i], ';')[[1]])
dat = bind_rows(dat, dt)
}
But I think the way above is not a good solution. I found it's extremely slow when I have a very large dataset (my actual data).
Does anyone have a better and much quicker way to achieve that? Note that the number of hashtags for each id varies a lot. tidyr::separate
does not seems to apply here.