Flattening text rows in to a data frame

Question

I have a dataframe with one column and rows like this:

row1:

something here

another line here
and we are here


but we also have this

row2:

something here2

another line here2



and we are here2


but we also have this2

is it possible to remove the big spaces and flatten all text into one line? The output is something like this:

row1: something here another line here and we are here but we also have this
row2: something here2 another line here2 and we are here2 but we also have this2

score 0 · Accepted Answer · answered Mar 17 '17 at 14:53

0

Try doing this on your rows to remove carriage return and new line from your string :

library(stringr)
str_replace_all(x, "[\r\n]" , "")

x representing your string

answered Mar 17 '17 at 14:53

Loay Ashmawy

677
1
7
26

The result from this isn't quite the desired result, words that should have spaces between them are pasted together. – Mike H. Mar 17 '17 at 15:03

score 0 · Answer 2 · edited May 23 '17 at 12:17

It looks like you want to collapse all the white space into a single space. Something like this from this SO question (Merge Multiple spaces to single space; remove trailing/leading spaces) should give the desired result:

string<-"something here2

another line here2



and we are here2


but we also have this2
"

library(stringr)
gsub("\\s+"," ",str_trim(string))

##[1] "something here2 another line here2 and we are here2 but we also have this2"

For a data frame:

df<-structure(list(strings = structure(c(2L, 1L), .Label = c("something here\n\nanother line here\n\n\n\nand we are here\n\n\nbut we also have this\n", 
                                                             "something here2\n\nanother line here2\n\n\n\nand we are here2\n\n\nbut we also have this2\n"
               ), class = "factor"), strings_cl = c("something here2 another line here2 and we are here2 but we also have this2", 
                                     "something here another line here and we are here but we also have this"
               )), .Names = c("strings", "strings_cl"), row.names = c(NA, -2L
               ), class = "data.frame")

df$strings_cl <- gsub("\\s+"," ",str_trim(df$strings))

Flattening text rows in to a data frame

2 Answers2