How to reshape a data.frame in R without a loop?

Question

I have a data.frame in R. I need to compare two rows of the data and if they are the same I need to merge the rows and combine the data in one column. I feel like this is a common need when working with R so using ddply or some other package should be able to accomplish this task. Below is the data as is, dat, and what it should like after some code, foo. I’m new with R so any help is greatly appreciated.

Before:

 dat <- structure(list(detected_id = c(11, 11, 4), reviewer_name = c("mike", 
"mike", "john"), created_at = c("2016-05-04 10:02:45", "2016-05-04 10:02:45", 
"2016-05-04 10:02:45"), stage = c(2L, 2L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good", 
"Detected Organism: Staphylococcus Aureus, Comment: Note 1", 
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id", 
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA, 
-3L), class = "data.frame")

After:

foo <- structure(list(detected_id = c(11L, 4L), reviewer_name = c("mike", 
"john"), created_at = structure(c(1L, 1L), .Label = "5/4/16 10:02", class = "factor"), 
    stage = c(2L, 1L), V7 = structure(c(2L, 1L), .Label = c("Detected Organism: Human Adenovirus 7, Comment: test", 
    "Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1"
    ), class = "factor")), .Names = c("detected_id", "reviewer_name", 
"created_at", "stage", "V7"), row.names = c(NA, -2L), class = "data.frame")

EDIT:

the solutions below worked for the dataset I provided, however I've found a case where these solutions don't actually work as intended. This is an example of a data.frame that fails. Just a note, the detected_id column is obsolete for me.

dat <- structure(list(detected_id = c(11, 11, 11, 11, 12, 4), reviewer_name = c("Mike", 
"Mike", "Mike", "Mike", "John", "John"), created_at = c("2016-05-04 10:02:45", 
"2016-05-04 10:02:45", "2016-05-04 10:02:45", "2016-05-04 10:02:45", 
"2016-05-04 10:02:45", "2016-05-04 10:02:45"), stage = c(2L, 
3L, 2L, 3L, 1L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good", 
"Detected Organism: Staphylococcus Aureus, Comment: Looks good", 
"Detected Organism: Staphylococcus Aureus, Comment: Note 1", 
"Detected Organism: Staphylococcus Aureus, Comment: Note 1", 
"Detected Organism: Stenotrophomonas Maltophilia, Comment: new note", 
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id", 
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA, 
-6L), class = "data.frame")

SOLUTION: remove the detected_id column before reshaping the data.frame, Thanks @eddi

score 3 · Accepted Answer · answered May 06 '16 at 17:50

3

library(data.table)

setDT(dat)[, paste(V7, collapse = "; ")
           , by = .(detected_id, reviewer_name, created_at, stage)]
#   detected_id reviewer_name          created_at stage
#1:          11          mike 2016-05-04 10:02:45     2
#2:           4          john 2016-05-04 10:02:45     1
#                                                                                                                         V1
#1: Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1
#2:                                                                     Detected Organism: Human Adenovirus 7, Comment: test

answered May 06 '16 at 17:50

eddi

49,088
6
104
155

good solution, works as expected. Thanks! – webDevleoper101 May 06 '16 at 18:01
check out the edit I made – webDevleoper101 May 06 '16 at 19:04
1

@webDevleoper101 I'm not sure what "failed" means to you. It works exactly as expected. A little unclear what you were hoping for - perhaps you want to take `detected_id` out of the `by`. – eddi May 06 '16 at 19:07
OK, your right.. I just removed the 'detected_id' column and than ran the code and it works as expected.. Thanks! – webDevleoper101 May 06 '16 at 19:10
np, glad that worked – eddi May 06 '16 at 19:11

score 0 · Answer 2 · answered May 06 '16 at 17:53

0

using base R

with(dat, aggregate(V7,list(detected_id=detected_id, reviewer_name=reviewer_name, created_at=created_at, stage=stage),paste,collapse=' '))

answered May 06 '16 at 17:53

Ananta

3,671
3
22
26

I like your solution better, as it's just base R, please check the the edit I just made – webDevleoper101 May 06 '16 at 19:05

How to reshape a data.frame in R without a loop?

2 Answers2