-1

Hi everybody I am trying to optimize below R code with for loops because it takes so much time to execute. I even tried compiler in R to turn the function to bytecode but performance even worse. So, Is there any way to write this code with apply functions

word_separation<-function(inp_data){
df<-NULL
for(k in 1:nrow(inp_data)){
    vec<-unlist(strsplit(as.vector(inp_data[k,]),split=","))
        if(length(vec)==1){
            df<-rbind(df,data.frame(first_col=vec,second_col=vec))
        }else{
            temp_df<-NULL
            for(i in 2:length(vec)){
                for(j in i:length(vec){
                    temp_df<-rbind(temp_df,data.frame(first_col=vec[1],second_col=paste(vec[i:j],collapse=",")))
                }
                df<-rbind(df,temp_df)
                df[df==""]<-NA
                df<-df %>% unique() %>% na.omit()
            }
        }
    }
    return(df)
}

Here my inp_data dataframe has single column with data

    column
Milk,Bread,Eggs,Jam
Apple,Milk,Beer

When passed to function returns an dataframe with columns, first column with first word and second column with combinations of other words in dataframe.

 first_col     second_col
   Milk          Bread
   Milk     Bread,Eggs
   Milk Bread,Eggs,Jam
   Milk           Eggs
   Milk       Eggs,Jam
   Milk            Jam
   Apple           Milk
   Apple      Milk,Beer
   Apple           Beer
Cath
  • 23,906
  • 5
  • 52
  • 86
guravaraju
  • 49
  • 3
  • How about a description what the method should do? – MrSmith42 Mar 13 '17 at 16:48
  • 5
    You are doing just about everything possible to make this run slowly. Always pre-allocate storage for the loop (set `df` to some size and fill it in, increase `df`'s size only if you get close to filling it if you don't know how big to make `df` at the start.); use a matrix or a list not a data frame; *never* grow objects at each iteration in a loop - you fill objects in but never grow them (except if you've filled them up and haven't finished iterating); do as few operations/function calls as possible inside a loop - move operations to higher levels or outside the loop to avoid too many calls – Gavin Simpson Mar 13 '17 at 17:35
  • Thanks for the reply, Can you please elaborate or give some examples to alert the code – guravaraju Mar 13 '17 at 18:18
  • also, you can have a look at [this Q&A](http://stackoverflow.com/q/28983292/4137985) to know more about `-apply` functions – Cath Mar 14 '17 at 10:20

1 Answers1

3

The OP has specified that the input data consists of a single column. So we need to split the column before creating the combinations. (The answer given by Sathish has silently skipped this step.)

The data.table solution below uses only one lapply().

Data

Edit: Added row with only one word

library(data.table)
inp_data <- fread("    column
Milk,Bread,Eggs,Jam
Apple,Milk,Beer
Butter", sep = "\n")

Code

# split strings, output in long format, add row number for later join
molten <- inp_data[, rn := .I][, strsplit(column, ","), by = rn]
# create combinations of all words (except the first one)
combined <- molten[, unlist(
  lapply(seq_len(.N - 1), function(.i) as.data.table(
    combn(V1[-1], .i, paste, collapse = ",", simplify = TRUE)))), by = rn]
# right join
combined[molten[, .(rn, first_col = first(V1)), by = rn], 
         .(rn, first_col, second_col = V1), on = "rn"]
#    rn first_col     second_col
# 1:  1      Milk          Bread
# 2:  1      Milk           Eggs
# 3:  1      Milk            Jam
# 4:  1      Milk     Bread,Eggs
# 5:  1      Milk      Bread,Jam
# 6:  1      Milk       Eggs,Jam
# 7:  1      Milk Bread,Eggs,Jam
# 8:  2     Apple           Milk
# 9:  2     Apple           Beer
#10:  2     Apple      Milk,Beer
#11:  3    Butter             NA

Edit: Changed join to ensure that rows consisting of only one word are included as well.

Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134