Split long string into multiple dataframe columns while not splitting across a word

Question

I am able to split a long string into 40 char columns using the following:

temp_df <- data.frame(
  long_string_column = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Whatever ornare nunc tellus, nec convallis enim viverra sit amet."
)


library(tidyr)
temp_df_new <- separate(temp_df, 
         long_string_column, 
         into = c("split1", "split2", "split3", "split4", "split5"), 
         sep = c(40, 80, 120, 160),
         remove = FALSE)

However this splits across words and can result in half the word being in one column and the other half being in the next.

Is there anyway to ensure that splitting across words doesn't occur?

Please add a sample of your [data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — NelsonGon, Jun 17 '20 at 02:07

score 1 · Accepted Answer · answered Jun 17 '20 at 05:33

You can use str_wrap() and separate on the newline characters. This will avoid breaking up words and should result in the new columns having <= 40 characters each (although there may be exceptions depending on the nature of the original strings).

library(stringr)
library(dplyr)
library(tidyr)

temp_df <- temp_df %>%
  mutate(tmp = str_wrap(long_string_column, 40))

cols <- seq(max(str_count(temp_df$tmp, "\n") + 1))

temp_df %>%
  separate(tmp, 
           into = paste0("split_", cols), 
           sep = "\n",
           remove = FALSE) %>%
  select(-tmp)

Split long string into multiple dataframe columns while not splitting across a word

1 Answers1