-1

I want to create a subset of data and gather the data based on their max value: Here is the code:

mat=matrix(c(0,0,0,1,2,3,4,5,0,0,0,0), ncol=1)
mat=as.data.frame(mat)
colnames(mat) <- sub("V1", "value", colnames(mat))
gr=matrix(c(1,2,2,2,2,3,4,4,4,5,5,5), ncol=1)
gr=as.data.frame(gr)
colnames(gr) <- sub("V1", "group", colnames(gr))
df=as.data.frame(cbind(mat, gr))
data = subset(df, value == max(value))

So I created a dataframe df which looks like this:

      value group
1      0     1
2      0     2
3      0     2
4      1     2
5      2     2
6      3     3
7      4     4
8      5     4
9      0     4
10     0     5
11     0     5
12     0     5

So I want to gather the data in a subset data frame based on the max value e.g.

  • For group 1 the max value is 0.
  • For group 2 max value is 2
  • For group 3 max value is 3 and so on.

The result then should be:

    value group
1      0     1
5      2     2
6      3     3
8      5     4
12     0     5

Instead with subset(df, value == max(value)) I get:

  value group
8     5     4

Any suggestion of what function I could use to solve it?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Ville
  • 547
  • 1
  • 3
  • 21
  • 1
    Look at the [R-FAQ for finding mean by group](https://stackoverflow.com/q/11562656/903061), but just use `max` instead of `mean`. There are many methods there in Base R, dplyr, data.table, and more. Probably the nicest base R version is `aggregate(value ~ group, df, max)` – Gregor Thomas May 30 '18 at 19:34

2 Answers2

1

Here is a solution with aggregate:

> aggregate(df$value, list(df$group), FUN = max)
  Group.1 x
1       1 0
2       2 2
3       3 3
4       4 5
5       5 0

Similarly you can use the summaryBy function in the doBy package, like so:

> require(doBy)
> summaryBy(value ~ group, data = df, FUN = max)
  group value.max
1     1         0
2     2         2
3     3         3
4     4         5
5     5         0
93i7hdjb
  • 1,136
  • 1
  • 9
  • 15
1

Using dplyr, and a more concise way of creating your df:

df <- data.frame(
  value = c(0,0,0,1,2,3,4,5,0,0,0,0),
  group = c(1,2,2,2,2,3,4,4,4,5,5,5)
)

library(dplyr)
df %>% 
  group_by(group) %>% 
  summarize(max.value = max(value))
#> # A tibble: 5 x 2
#>   group max.value
#>   <dbl>     <dbl>
#> 1     1         0
#> 2     2         2
#> 3     3         3
#> 4     4         5
#> 5     5         0

Created on 2018-05-30 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
Phil
  • 7,287
  • 3
  • 36
  • 66