Selecting unique rows in R

Question

There is a data.frame with duplicate values for the variable "Time"

> data.old
             Time  Count  Direction
1    100000630955     95          1
2    100000637570      5          0
3    100001330144      7          1
4    100001330144     33          1
5    100001331413     39          0
6    100001331413     43          0
7    100001334038      1          1
8    100001357594     50          0

You must leave all values without duplicates. And sum the values of the variable "Count" with duplicate values, i.e.

> data.new
             Time  Count  Direction
1    100000630955     95          1
2    100000637570      5          0
3    100001330144     40          1
4    100001331413     82          0
5    100001334038      1          1
6    100001357594     50          1

All I could find these unique values with the help of the command

> data.old$Time[!duplicated(data.old$Time)]
   [1] 100000630955 100000637570 100001330144 100001331413 100001334038 100001357594

I can do this in a loop, but maybe there is a more elegant solution

In data.table: `setDT(data.old)[, .(Count=sum(Count), Direction=max(Direction)), by=Time]`. — lmo, Aug 02 '17 at 12:41
@akrun Yes. Your decision is also suitable. But how to save other variables? — Dmitry, Aug 02 '17 at 12:42
You can use `aggregate(Count ~., data.old, sum)` if all the other variables are grouping — akrun, Aug 02 '17 at 12:50

Andrew Brēza · Accepted Answer · 2017-08-02T13:23:36.883

5

Here's one approach using dplyr. Is this what you want to do?

library(tidyverse)
data.old %>%
group_by(Time) %>%
   summarise(Count = sum(Count))

Edit: Keeping other variables

OP has indicated a desire to keep the values of other variables in the dataframe, which summarise deletes. Assuming that all values of those other variables are the same for all the rows being summarised, you could use the Mode function from this SO question.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

Then change my answer to the following, with one call to Mode for each variable you want kept. This works with both numeric and character data.

library(tidyverse)
data.old %>%
group_by(Time) %>%
   summarise(Count = sum(Count), Direction = Mode(Direction))

edited Aug 02 '17 at 13:23

answered Aug 02 '17 at 12:28

Andrew Brēza

7,705
3
34
40

Yes, that's the solution. But unfortunately, I forgot to indicate that the table has other variables. With this script, these variables disappear. I'll fix it now. How to save them? – Dmitry Aug 02 '17 at 12:39
1

@Dmitry I just edited my answer to accommodate additional variables. – Andrew Brēza Aug 02 '17 at 13:24
1

Your answer helped to solve other problems. You really helped! – Dmitry Aug 02 '17 at 14:18
I'm glad to hear that! – Andrew Brēza Aug 02 '17 at 15:53

score 2 · Answer 2 · answered Aug 02 '17 at 12:41

2

here is the one by using aggregating function

data.new<-aggregate( Count~Time , data=data.old, sum, na.rm=TRUE)

answered Aug 02 '17 at 12:41

RAVI TEJA M

151
4

score 2 · Answer 3 · answered Aug 02 '17 at 12:46

2

 library(dplyr)  
  data.old %>% group_by(Time) %>% summarise(Count = sum(Count), 
                                       Direction =  unique(Direction))

Of course, assuming you want to keep unique values of Direction column

answered Aug 02 '17 at 12:46

Megha John

153
1
12

Selecting unique rows in R

3 Answers3

Edit: Keeping other variables

Linked