Group rows based on identifier without aggregating data

Question

Using R, I wish to combine rows (max of 2) per a common identifier from within the same dataset.

My dataset has an ID, Date and Text field.

df1 <- data.frame(ID = c("1", "1", "2", "3", "3"),
          Date = c("2017-02-12", "2017-03-12", "2017-02-14", "2016-02-12", "2014-02-12"),
          Text = c("Pie", "Cake", "Pie", "IceCream", "Chocolate"))

I want to combine rows based on the identifier without aggregating, for an end result that would double my column count.

Resulting in:

ID|Date.a     | Text.a  | Date.b     |Text.b  
1 |2017-02-12 | Pie     | 2017-03-12 | Cake  
2 |2017-02-14 | Pie     |            |  
3 |2017-02-14 | IceCream| 2017-02-12 |Chocolate

Any help, much appreciated.

Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. — Jaap, Feb 24 '17 at 16:45
@Jaap clarified I hope. Perhaps you could offer some insight now? — Scottieie, Feb 24 '17 at 17:25
I changed my original answer to attempt to better answer your question. Although I definitely recommend looking into the tidy data link I posted — tbradley, Feb 24 '17 at 17:39
@Jaap, that looks so nice, with the even lines and all. How on earth did you do that? Looking around for a link to describe how to properly format isn't where I have looked apparently......my amature status is apparent isn't it, hence my 5, errr....3 tokens. — Scottieie, Feb 25 '17 at 22:26
[see here](http://stackoverflow.com/editing-help) for how format you post properly — Jaap, Feb 26 '17 at 11:30

tbradley · Answer 1 · 2017-02-24T17:36:47.300

1

It seems as though your want your data to become messier. Tidy Data is a pretty good resource in how data should typically be formatted in order to perform effective analyses.

Here is an example of a transformation you could make to view the data in the manner in which I think you are looking for while looking cleaner than what your requested output looks like.

library(tidyverse)
df1 <- data.frame(ID = c("1", "1", "2", "3", "3"),
              Date = c("2017-02-12", "2017-03-12", "2017-02-14", "2016-02-12", "2014-02-12"),
              Text = c("Pie", "Cake", "Pie", "IceCream", "Chocolate"))

df2 <- spread(df1, key = Text, value = Date)

> df2
  ID       Cake  Chocolate   IceCream        Pie
1  1 2017-03-12       <NA>       <NA> 2017-02-12
2  2       <NA>       <NA>       <NA> 2017-02-14
3  3       <NA> 2014-02-12 2016-02-12       <NA>

edited Feb 24 '17 at 17:36

answered Feb 24 '17 at 16:52

tbradley

2,210
11
20

I probably should have made it more clear that it is within the same dataset that I wish to combine rows. I will work on making the question more clear. – Scottieie Feb 24 '17 at 17:07
In a sense I do want to make my data messier. The analysis has already happened and now I am doing some reporting. One of those looks at current vs previous records. – Scottieie Feb 24 '17 at 20:03
I appreciate your response to this, and it certainly would have solved a different problem. My goal was to join the two most current entries together for report purposes, side by side. My initial RODBC query pulled out the top two most recent rows per person, keying them 1 and if applicable, 2 (note that some didn't have a second record). I had that part figured out. I was flummoxed at how to join the top and bottom rows. Then separate them out into two tables using !grepl on the key field. Last step was to do a join. Thanks again for willingness to help. SW. – Scottieie Mar 11 '17 at 21:32

Group rows based on identifier without aggregating data

1 Answers1