2

I have the following data.frame:

id name shoes
1  Pete "nike; adidas; puma"
2  Anna nike

base on this data frame I would like to create the followin data.frame

id name shoes
1 Pete  nike
1 Pete  adidas
1 Pete  puma
2 Ana   nike

Any help would be appreciated.

2 Answers2

5

The tidyverse's separate_rows function is perfect for this. By default, it separates a string whenever it encounters any non-alphanumeric characters, which includes ; in your example.

library(tidyverse)

df <- data.frame(name = c('Pete', 'Anna'), shoes = c('nike; adidas; puma', 'nike'))

df_long <- df %>% 
  separate_rows(shoes)

  name  shoes 
  <chr> <chr> 
1 Pete  nike  
2 Pete  adidas
3 Pete  puma  
4 Anna  nike 
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • 1
    It should be **tidyr**'s `separate_rows()`. Saying `tidyverse` is not precise. – Darren Tsai Jan 15 '23 at 16:06
  • Fair point. Notably, the tidyverse as a whole imports magrittr's `%>%` operator (through dplyr, I believe). While I know these are separate packages with different functions, I tend to just import the entire tidyverse suite, since most common workflows will use multiple components from it anyway. – jdobres Jan 15 '23 at 16:17
  • Nice discussion. `tidyr` also imports magrittr's `%>%` operator, so even if you `library(tidyr)` solely, your code can still work. In my experience on SO, I will say that I'm using a `tidyverse` solution when I need to library more than one packages that are included in `tidyverse` to finish a job. – Darren Tsai Jan 16 '23 at 08:33
0

An approach using unnest after strsplit

library(dplyr)
library(tidyr)

df %>% 
  mutate(shoes = strsplit(shoes, "; ")) %>% 
  unnest(shoes)
# A tibble: 4 × 3
     id name  shoes
  <int> <chr> <chr>
1     1 Pete  nike
2     1 Pete  adidas
3     1 Pete  puma
4     2 Anna  nike

Data

df <- structure(list(id = 1:2, name = c("Pete", "Anna"), shoes = c("nike; adidas; puma",
"nike")), class = "data.frame", row.names = c(NA, -2L))
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29