0

I have a dataframe like this:

salesDat

id    time    amount
1     12       100
1     14       120
1     22       120
2     5        30
2     12       30
3     40       3
3     75       20
3     80       20
3     85       75

What do I want?

I want to find the max time value and only use that value so that the id column is unique.

eg: the new dataframe will look like this

salesDat

id    time    amount
1     22       120
2     12       30
3     85       75

What did I do?

I used reshape2 package

library(reshape2)

uniqueSalesDat <- dcast(salesDat, 
                  id~time)

The code does not work. How do I fix it?

floss
  • 2,603
  • 2
  • 20
  • 37
  • 1
    `aggregate(.~id, df, max)` Or `df %>% group_by(id) %>% summarise_all(max)` in `dplyr`. – Ronak Shah Mar 12 '20 at 05:46
  • Thanks!, how do I deal with `na` values in the `time` column? – floss Mar 12 '20 at 06:03
  • 1
    `df %>% group_by(id) %>% summarise_all(max, na.rm =TRUE)` – Ronak Shah Mar 12 '20 at 06:04
  • I am getting this now: `Error in Summary.factor(c(2L, 2L, 1L, : ‘max’ not meaningful for factors` – floss Mar 12 '20 at 06:09
  • `df %>% type.convert(as.is = TRUE) %>% group_by(id) %>% summarise_all(max, na.rm =TRUE)` – Ronak Shah Mar 12 '20 at 06:10
  • It removes that `factor` error but I still do not get `max` of `time` group by `id` – floss Mar 12 '20 at 06:15
  • Then please add data using `dput` in your post. It works fine for me at my end and gives the exact expected output if I use this data `structure(list(id = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), time = c(12L, 14L, 22L, 5L, 12L, 40L, 75L, 80L, 85L), amount = c(100L, 120L, 120L, 30L, 30L, 3L, 20L, 20L, 75L)), class = "data.frame", row.names = c(NA, -9L))` – Ronak Shah Mar 12 '20 at 06:21

0 Answers0