There is a data frame that has a list of countries of varying lengths. These have been separated by using the separate()
function from the dplyr
package. The code is as follows,
library(dplyr)
df <- data.frame(countries=(c("England","Australia,Pakistan", "India,England","Denmark", "",
"Australia, Pakistan, New Zealand, England", "United States, England, Pakistan")))
wrangled_df <- df %>%
separate(countries,
into = c("country_1", "country_2", "country_3","country_4"),
sep = ",", remove = T)
wrangled_df
The output is as follows,
country_1 country_2 country_3 country_4
1 England <NA> <NA> <NA>
2 Australia Pakistan <NA> <NA>
3 India England <NA> <NA>
4 Denmark <NA> <NA> <NA>
5 <NA> <NA> <NA>
6 Australia Pakistan New Zealand England
7 United States England Pakistan <NA>
This works fine as long as there is limited number of separation by comma (in this case 4).
However, if a particular row has list of countries is separated by many commas, it would be highly cumbersome to use the separate()
function. Is there is a simplified way of doing this wherein the data gets split by comma automatically and the necessary number of columns are created?