0

I have a data frame with two millions of observations.

Sample of the data is given in the following table.

Pid Feature     Value
1   color       Red
1   size        10
1   weight      High
2   angle       90
2   temperature It works with low temperature
2   wheel       No
3   dimensions  23ft x 23 ft

I want to concatenate feature list and its value in the following data frame

Pid  Feature_list               Values
1   color, size, weight         Red, 10, High
2   angle, temperature, wheel   90, it works with low temperature, No
3   dimensions                  23ft x 23 ft

I used foreach and paste command in R. Here is an example of my code that one I used.

 foreach( #all products# ) %dopar%
   {
  ... 
    feature_sum <- rbind(feature_sum,pid , paste(att[att$id==pid,][2][,], collapse = " "), paste(att[att$pid==pid,][3][,], collapse = " ")))

  }

But problem is it takes too long time to process the data according to desired format.

Is there any way to speed up the processing? Or can I avoid foreach loop?

enggiqbal
  • 39
  • 4

1 Answers1

3

We can use data.table

library(data.table)
setDT(df1)[ ,lapply(.SD, toString) , by = Pid]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    `toString` is a new one... I note that it will accomplish the same as `lapply(.SD, paste, collapse = ",")` since we don't use the `width` argument – MichaelChirico Feb 11 '16 at 16:25
  • 1
    @MichaelChirico Yes, except that `collapse = ", "` (a space after the `,` – akrun Feb 11 '16 at 16:28