0

I have a column with common levels as reps (1-4). I have data that goes with them in col3. Some of the levels don't contain information but for the ones that do, I would like to merge the values into one column by each common level in col1. The values in col3 are not consistent.

I have tried removing duplicates but this does not merge col3 values.

train <- data.table(col1=c(rep('a0001',4),rep('b0002',4)), col2=c(seq(1,4,1),seq(1,4,1)), col3=c("12 43 543 1232 43 543", "","","","15 24 85 64 85 25 46","","658 1568 12 584 15684",""))

this is reproducible code I have about 40000 lines to do.

result<-data.frame(col1=c("a0001","b0002"),col3=c("12 43 543 1232 43 543",'15 24 85 64 85 25 46 658 1568 12 584 15684'))

This is the result I am looking for...

FrosyFeet456
  • 349
  • 2
  • 12
  • Very similar to https://stackoverflow.com/questions/15933958/collapse-concatenate-aggregate-a-column-to-a-single-comma-separated-string-w , after taking a selection of rows where `col3 != ""` – thelatemail Sep 13 '19 at 03:37

2 Answers2

1

We can bring col3 values into separate_rows, remove empty values, group_by col1 and paste col3 values together.

library(dplyr)

train %>%
   tidyr::separate_rows(col3) %>%
   filter(col3 != '') %>%
   group_by(col1) %>%
   summarise(col3 = paste(col3, collapse = " "))

# col1  col3                                      
#  <chr> <chr>                                     
#1 a0001 12 43 543 1232 43 543                     
#2 b0002 15 24 85 64 85 25 46 658 1568 12 584 15684
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

I am learning from @Ronak Shah's answer. This could be a variation:

library(dplyr)
train %>% group_by(col1) %>% summarise(col3 = paste(col3, collapse = " "))

  col1  col3                                          
  <chr> <chr>                                         
1 a0001 "12 43 543 1232 43 543   "                    
2 b0002 "15 24 85 64 85 25 46  658 1568 12 584 15684 "
Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27