0

I have a data.frame that has a key, fignum and a data field, codefile, but where fignum may be duplicated.

Where duplicates occur, I want to combine the codefile data fields into a single row, separated by ,. Here's my input:

> cf
   fignum                       codefile
8     4.6           04_6-cholera-water.R
9     P.3 04_P3a-cholera-neighborhoods.R
10    P.3       04_P3b-SnowMap-density.R
11    5.5    05_5-playfair-east-indies.R

> duplicated(cf[,"fignum"])
[1] FALSE FALSE  TRUE FALSE

The desired output combines the two "P.3" codefile values into one observation, to look like this:

> cf-wanted
   fignum                                                  codefile
8     4.6                                      04_6-cholera-water.R
9     P.3  04_P3a-cholera-neighborhoods.R, 04_P3b-SnowMap-density.R
10    5.5                               05_5-playfair-east-indies.R
user101089
  • 3,756
  • 1
  • 26
  • 53

1 Answers1

2

We could group_by by fignum and summarise

library(dplyr)
cf %>% 
  group_by(fignum) %>% 
  summarise(codefile = paste0(codefile, collapse = ', '), .groups = 'drop')
fignum codefile                                                
  <chr>  <chr>                                                   
1 4.6    04_6-cholera-water.R                                    
2 5.5    05_5-playfair-east-indies.R                             
3 P.3    04_P3a-cholera-neighborhoods.R, 04_P3b-SnowMap-density.R
TarJae
  • 72,363
  • 6
  • 19
  • 66