0

I am able to split a long string into 40 char columns using the following:

temp_df <- data.frame(
  long_string_column = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Whatever ornare nunc tellus, nec convallis enim viverra sit amet."
)


library(tidyr)
temp_df_new <- separate(temp_df, 
         long_string_column, 
         into = c("split1", "split2", "split3", "split4", "split5"), 
         sep = c(40, 80, 120, 160),
         remove = FALSE) 

However this splits across words and can result in half the word being in one column and the other half being in the next.

enter image description here

Is there anyway to ensure that splitting across words doesn't occur?

Scott
  • 446
  • 4
  • 16
  • Please add a sample of your [data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – NelsonGon Jun 17 '20 at 02:07

1 Answers1

1

You can use str_wrap() and separate on the newline characters. This will avoid breaking up words and should result in the new columns having <= 40 characters each (although there may be exceptions depending on the nature of the original strings).

library(stringr)
library(dplyr)
library(tidyr)

temp_df <- temp_df %>%
  mutate(tmp = str_wrap(long_string_column, 40))

cols <- seq(max(str_count(temp_df$tmp, "\n") + 1))

temp_df %>%
  separate(tmp, 
           into = paste0("split_", cols), 
           sep = "\n",
           remove = FALSE) %>%
  select(-tmp)
Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56