1

There is a data frame that has a list of countries of varying lengths. These have been separated by using the separate() function from the dplyr package. The code is as follows,

library(dplyr)
df <- data.frame(countries=(c("England","Australia,Pakistan", "India,England","Denmark", "",
                             "Australia, Pakistan, New Zealand, England", "United States, England, Pakistan")))
wrangled_df <- df %>%
    separate(countries,
             into = c("country_1", "country_2", "country_3","country_4"),
             sep = ",", remove = T)
wrangled_df

The output is as follows,

 country_1       country_2       country_3   country_4
1       England      <NA>         <NA>      <NA>
2     Australia  Pakistan         <NA>      <NA>
3         India   England         <NA>      <NA>
4       Denmark      <NA>         <NA>      <NA>
5                    <NA>         <NA>      <NA>
6     Australia  Pakistan  New Zealand   England
7 United States   England     Pakistan      <NA> 

This works fine as long as there is limited number of separation by comma (in this case 4). However, if a particular row has list of countries is separated by many commas, it would be highly cumbersome to use the separate() function. Is there is a simplified way of doing this wherein the data gets split by comma automatically and the necessary number of columns are created?

0 Answers0