I have a table:
id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14
And I want to convert it to:
id first last
1 1 5
2 3 7
3 8 14
Please help!
I have a table:
id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14
And I want to convert it to:
id first last
1 1 5
2 3 7
3 8 14
Please help!
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'id', we get the first
and last
value of 'time'
library(data.table)
setDT(df1)[, list(firstocc = time[1L], lastocc = time[.N]),
by = id]
Or with dplyr
, we use the same methodology.
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(firstocc = first(time), lastocc = last(time))
Or with base R
(no packages needed)
do.call(rbind, lapply(split(df1, df1$id),
function(x) data.frame(id = x$id[1],
firstocc = x$time[1], lastocc = x$time[nrow(x)])))
If we need to be based on the min
and max
values (not related to the expected output) , the data.table
option is
setDT(df1)[, setNames(as.list(range(time)),
c('firstOcc', 'lastOcc')) ,id]
and dplyr
is
df1 %>%
group_by(id) %>%
summarise(firstocc = min(time), lastocc = max(time))
There are many packages that can perform aggregation of this sort in R. We show how to do it without any packages and then show it with some packages.
1) Use aggregate
. No packages needed.
ag <- aggregate(time ~ id, DF, function(x) c(first = min(x), last = max(x)))
giving:
> ag
id time.first time.last
1 1 1 5
2 2 2 7
3 3 3 14
ag
is a two column data frame whose second column contains a two column matrix with columns named 'first' and 'last'. If you want to flatten it to a 3 column data frame use:
do.call("cbind", ag)
giving:
id first last
[1,] 1 1 5
[2,] 2 2 7
[3,] 3 3 14
1a) This variation of (1) is more compact at the expense of uglier column names.
aggregate(time ~ id, DF, range)
2) sqldf
library(sqldf)
sqldf("select id, min(time) first, max(time) last from DF group by id")
giving:
id first last
[1,] 1 1 5
[2,] 2 2 7
[3,] 3 3 14
3) summaryBy summaryBy in the doBy package is much like aggregate
:
library(doBy)
summaryBy(time ~ id, data = DF, FUN = c(min, max))
giving:
id time.min time.max
1 1 1 5
2 2 2 7
3 3 3 14
Note: Here is the input DF
in reproducible form:
Lines <- "id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14"
DF <- read.table(text = Lines, header = TRUE)
Update: Added (1a), (2) and (3) and fixed (1).
You can remove duplicates and reshape it
dd <- read.table(header = TRUE, text = "id time
1 1
1 2
1 5
2 3
2 2
2 7
3 8
3 3
3 14")
d2 <- dd[!(duplicated(dd$id) & duplicated(dd$id, fromLast = TRUE)), ]
reshape(within(d2, tt <- c('first', 'last')), dir = 'wide', timevar = 'tt')
# id time.first time.last
# 1 1 1 5
# 4 2 3 7
# 7 3 8 14