I have a data.frame
in R. I need to compare two rows of the data and if they are the same I need to merge the rows and combine the data in one column. I feel like this is a common need when working with R so using ddply
or some other package should be able to accomplish this task. Below is the data as is, dat
, and what it should like after some code, foo.
I’m new with R so any help is greatly appreciated.
Before:
dat <- structure(list(detected_id = c(11, 11, 4), reviewer_name = c("mike",
"mike", "john"), created_at = c("2016-05-04 10:02:45", "2016-05-04 10:02:45",
"2016-05-04 10:02:45"), stage = c(2L, 2L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id",
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA,
-3L), class = "data.frame")
After:
foo <- structure(list(detected_id = c(11L, 4L), reviewer_name = c("mike",
"john"), created_at = structure(c(1L, 1L), .Label = "5/4/16 10:02", class = "factor"),
stage = c(2L, 1L), V7 = structure(c(2L, 1L), .Label = c("Detected Organism: Human Adenovirus 7, Comment: test",
"Detected Organism: Staphylococcus Aureus, Comment: Looks good; Detected Organism: Staphylococcus Aureus, Comment: Note 1"
), class = "factor")), .Names = c("detected_id", "reviewer_name",
"created_at", "stage", "V7"), row.names = c(NA, -2L), class = "data.frame")
EDIT:
the solutions below worked for the dataset I provided, however I've found a case where these solutions don't actually work as intended. This is an example of a data.frame that fails. Just a note, the detected_id column is obsolete for me.
dat <- structure(list(detected_id = c(11, 11, 11, 11, 12, 4), reviewer_name = c("Mike",
"Mike", "Mike", "Mike", "John", "John"), created_at = c("2016-05-04 10:02:45",
"2016-05-04 10:02:45", "2016-05-04 10:02:45", "2016-05-04 10:02:45",
"2016-05-04 10:02:45", "2016-05-04 10:02:45"), stage = c(2L,
3L, 2L, 3L, 1L, 1L), V7 = c("Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Looks good",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Staphylococcus Aureus, Comment: Note 1",
"Detected Organism: Stenotrophomonas Maltophilia, Comment: new note",
"Detected Organism: Human Adenovirus 7, Comment: test")), .Names = c("detected_id",
"reviewer_name", "created_at", "stage", "V7"), row.names = c(NA,
-6L), class = "data.frame")
SOLUTION: remove the detected_id column before reshaping the data.frame, Thanks @eddi