3

I have a data frame called "stemmoutput" (see below) :

     X1      X2       X3      X4      X5      X6      X7     X8     X9    X10     
1  tanaman  cabai                                    
2  banget   hama     sakit   tanaman                            
3  koramil  nogosari melaks  ecek     hama   tanaman padi    ppl    ds   rambun

And I want to merge multiple columns values into one column like this :

     TEXT
1  tanaman cabai                                     
2  banget hama sakit tanaman                            
3  koramil nogosari melaks ecek hama tanaman padi ppl ds rambun 

I have tried this code, and it works

stemmoutput$TEXT <- with(stemmoutput, paste(X1,X2,X3,X4,X5,X6,X7,X8,X9,X10, sep=" "))

but is there any other way that is more efficient, without having to write down the name of the column one by one?

I've also tried this code like below but that didn't work either.

for(i in names(stemmoutput)){
     stemmoutput$TEXT <- with(stemmoutput, paste(i, sep=" "))}
Ihda
  • 111
  • 1
  • 1
  • 6

2 Answers2

2

Try do.call

library(stringr)
newdat <- data.frame(TEXT=str_trim(do.call(paste, stemmoutput)),
                     stringsAsFactors=FALSE)

newdat
#                                                         TEXT
#1                                                tanaman cabai
#2                                    banget hama sakit tanaman
#3 koramil nogosari melaks ecek hama tanaman padi ppl ds rambun

It may be better to use , as delimiter if there are multi-part words within a column

 TEXT <- gsub(', [^A-Za-z]+', '', do.call(paste, c(stemmoutput, sep=', ')))

 newdat <- data.frame(TEXT, stringsAsFactors=FALSE)
 newdat
 #                                                                  TEXT
 #1                                                        tanaman, cabai
 #2                                          banget, hama, sakit, tanaman
 #3 koramil, nogosari, melaks, ecek, hama, tanaman, padi, ppl, ds, rambun
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here's another idea using tidyr

If you want to unite only columns from X1 to X10 you could do:

library(tidyr)
unite(stemmoutput, TEXT, num_range("X", 1:10), sep = " ")

If you want to unite all columns do:

unite(stemmoutput, TEXT, everything(), sep = " ")

Benchmarks

I tried the two approaches on the benchmark because I suspected unite would be much faster than do.call, but they ended up being pretty equivalent:

df <- data.frame(replicate(10,sample(paste0(
  sample(LETTERS[1:10]), collapse = ""), 10e5, replace = TRUE)))

mbm <- microbenchmark(
  akrun = data.frame(TEXT=str_trim(do.call(paste, df)), stringsAsFactors=FALSE),
  steven = unite(df, TEXT, everything(), sep = " "),
  times = 50
)

enter image description here

# Unit: milliseconds
#    expr       min        lq      mean    median       uq       max neval cld
#   akrun 1117.1350 1132.3861 1146.3943 1136.3094 1145.076 1232.5633    50   b
#  steven  910.7432  924.0386  927.8614  927.7224  929.649  995.3584    50  a
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77