I would like to merge rows in the my data frame by unique emails, but I do not want to lose any data. To do this I would like the function to combine rows with the same email address. Along with this, if there happens to be overlapping data for an email address that I am trying to combine into one, I want the data from the row with less cells filled in to be added into a new column. Please as questions because I know that I am not explaining this very clearly.
Below is an example of what I am looking for the function to do (data made up).
First Name | Last Name | Phone | Address | Shoe Size | |
---|---|---|---|---|---|
John | Schmitt | jschmitt@gmail.com | 914-392-1840 | address 1 | 4 |
Paul | Johnson | pjohnson@gmail.com | 274-184-3653 | address 2 | 2 |
Brad | Arnold | barnold@gmail.com | 157-135-3175 | address 3 | 5 |
John | Schmitt | jschmitt@gmail.com | 914-392-1840 | 6 |
This sheet should become:
First Name | Last Name | Phone | Address | Shoe Size | Shoe Size 2 | |
---|---|---|---|---|---|---|
John | Schmitt | jschmitt@gmail.com | 914-392-1840 | address 1 | 4 | 6 |
Paul | Johnson | pjohnson@gmail.com | 274-184-3653 | address 2 | 2 | |
Brad | Arnold | barnold@gmail.com | 157-135-3175 | address 3 | 5 |
Basically, the phone number connected to forjschmitt@gmail.com stays in the "Phone" column because it is the same for both rows. Even though the rows are not the same for the address, because the bottom row is blank, it stays the same. Finally, a new column is created for Shoe Size, because there are two differing values for the rows that we are merging. The way that the function should pick which Shoe size to put in Shoe Size 2 is by looking at the number of cells in each row. The shoe size in the row with more cells filled goes in the original Shoe Size column. The shoe size in the row with less cells filled goes in the new Shoe Size 2 column.
Feel free to ask any questions or make any suggestions about how I could do something of this nature in an easier way. I also haven't figured out what to do if the two rows with conflicting data have the same number of cells filled...