R: How to aggregate data but keep informations from non-aggregated columns?

Question

Searched for a solution for two days to no avail so far.

I have bird observations from different observation points. The observers write down the species, where they have seen them, and for how long.

Now it happens that from different points, observations are taken from the same area, but we only want to process the maximum value per species in an area.

So first, i aggregated the data by observation point, species and area, and summed up the time.

dt.agg <- aggregate(time ~ observp + species + time, dt, sum)

UUPS: completly wrong command:

should have been:

dt.agg <- aggregate(time ~ observp + species + area, dt, sum)



   observp species area time
1       1a  Rm    A1        43.878488
2       1c  Rm    A1       296.152707
3        2  Rm    A1        29.546790
4       1a Swm    A1        34.127713
5       1b Swm    A1        11.076880
6        2 Swm    A1         8.771703

This worked ok. But now, I only need the maximum value for time for a species in an area, BUT i also need to know from which observation point these numbers were taken.

In my example, row 2 should be kept for Rm in A1, while rows 1 and 3 should be dropped. The same applies to row 4 (keep) and 5 + 6 (drop)

When i just do another aggregate with species and area over time and max, the info for the observation point is lost.

Can someone please show me a way to achieve this?

Cheers

Bernd

(now with a new account and no reputation .. thank you ... google!)

p.s. Please feel free to give this question a better headline

UPDATE: trying to post the dput(head(dt,100))-sample as suggested. The original dataset has over 1300 rows. Hope thats what you want to have.

    structure(list(species = structure(c(3L, 3L, 3L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 
5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 
5L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 5L, 5L, 5L, 
3L, 3L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), .Label = c("Bf", 
"Gr", "Rm", "Row", "Swm", "Wf", "Wsb", "Wst", "Ww"), class = "factor"), 
    area = structure(c(35L, 19L, 34L, 34L, 32L, 19L, 34L, 35L, 
    10L, 36L, 10L, 14L, 13L, 25L, 27L, 28L, 34L, 19L, 14L, 14L, 
    34L, 1L, 12L, 13L, 15L, 3L, 3L, 34L, 34L, 34L, 14L, 14L, 
    13L, 13L, 1L, 1L, 1L, 11L, 1L, 8L, 21L, 22L, 22L, 9L, 9L, 
    9L, 5L, 9L, 3L, 22L, 27L, 26L, 21L, 26L, 21L, 27L, 3L, 9L, 
    20L, 20L, 9L, 26L, 34L, 30L, 3L, 2L, 3L, 4L, 20L, 3L, 37L, 
    16L, 17L, 18L, 14L, 35L, 34L, 34L, 34L, 36L, 4L, 4L, 3L, 
    3L, 17L, 17L, 38L, 36L, 10L, 38L, 36L, 10L, 38L, 37L, 35L, 
    30L, 16L, 15L, 17L, 5L), .Label = c("A1", "A10", "A11", "A12", 
    "A13", "A14", "A15", "A16", "A17", "A18", "A2", "A3", "A4", 
    "A5", "A6", "A7", "A8", "A9", "O1", "O10", "O11", "O12", 
    "O13", "O14", "O15", "O16", "O17", "O18", "O19", "O2", "O20", 
    "O21", "O22", "O3", "O4", "O5", "O7", "O8", "O9"), class = "factor"), 
    observp = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L), .Label = c("1a", "1b", "1c", "2", "3", "4"), class = "factor"), 
    time = c(36.37086972, 2.730715967, 1.891286914, 3.782573827, 
    4.496276059, 5.461431934, 18.91286914, 13.22577081, 5.823001976, 
    5.392743201, 3.882001317, 16.97305991, 6.094384821, 5.274262222, 
    5.462035947, 2.089427691, 7.565147654, 21.84572774, 25.45958986, 
    16.97305991, 7.565147654, 4.875387532, 8.885792099, 4.062923214, 
    6.636122805, 7.038317277, 10.55747592, 7.565147654, 7.565147654, 
    3.782573827, 25.45958986, 25.45958986, 12.18876964, 12.18876964, 
    19.50155013, 19.50155013, 9.750775065, 39.20627398, 4.875387532, 
    6.423076843, 2.436283538, 1.823249104, 1.823249104, 16.72889022, 
    41.82222555, 33.45778044, 12.30932064, 117.1022315, 3.519158639, 
    1.823249104, 27.31017974, 11.11346598, 4.872567077, 11.11346598, 
    4.872567077, 5.462035947, 3.519158639, 16.72889022, 14.86012871, 
    8.916077225, 25.09333533, 22.22693195, 3.782573827, 5.184879322, 
    10.55747592, 8.509038411, 10.55747592, 17.70988435, 5.944051483, 
    3.519158639, 17.69229328, 34.70586347, 5.966017168, 3.092236431, 
    2.828843318, 6.612885403, 3.782573827, 3.782573827, 7.565147654, 
    5.392743201, 17.70988435, 17.70988435, 3.519158639, 2.346105759, 
    11.93203434, 11.93203434, 2.386548395, 0.898790534, 0.64700022, 
    2.386548395, 0.898790534, 0.64700022, 2.684866944, 6.634609979, 
    1.239916013, 1.944329746, 3.2536747, 3.732819078, 6.711769315, 
    2.307997621)), .Names = c("species", "area", "observp", "time"
), row.names = c(NA, 100L), class = "data.frame")

Hi, could you provide the result of `dput(dt)` or a subset of it if it is too large, like `dput(head(dt,100))`? Also, what was the second `aggregate` that you tried? — Frank, Oct 07 '13 at 22:14
Bernd, please use `dput(yourobject)` or `dput(head(yourobject))` to add a sample of your data (or fake data) to your question. [This post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) is the canonical resource on how to do that. — SlowLearner, Oct 07 '13 at 22:15
I wasn't me for a while (account confusion). The marked solution worked for me. Hadn't had the time to check the others. Thanx all for your kind help! — Bernd V., Oct 10 '13 at 15:51

Henrik · Accepted Answer · 2013-10-07T23:32:15.947

You may also have a look another base function, by. The output is a list where each element is the result for different combination of INDICES.

bb <- by(data = df, INDICES = list(df$species, df$area), function(x) x[which.max(x$time), ])
bb
# : Rm
# : A1
# observp species area     time
# 2      1c      Rm   A1 296.1527
# -------------------------------------------------------------------- 
# : Swm
# : A1
# observp species area     time
# 4      1a     Swm   A1 34.12771

If you want to convert the list to a data.frame:

df2 <- do.call(rbind, bb)
df2
# observp species area      time
# 2      1c      Rm   A1 296.15271
# 4      1a     Swm   A1  34.12771

Another alternative:

library(plyr)
ddply(.data = df, .variables = .(species, area), subset,
  time == max(time))

This one seems to do the trick for the result of my first aggregation. I get the same 285 row like with my simple second aggregation of species and area, but this time with the proper observation point kept to the data. Will verify tomorrow, now toooo late ... Many thanx to all so far! — Bernd V., Oct 07 '13 at 23:42

score 0 · Answer 2 · answered Sep 22 '15 at 11:57

0

An example is

stulevel_agg_2 <- stulevel[, list(a1=mean(ability, na.rm = TRUE), a2=last(school, na.rm=T)),by = grade]

a1, a2 are new column names. last can take the last element within the group, but you need to load xts first.

answered Sep 22 '15 at 11:57

Minh Nguyen

111
1
4

R: How to aggregate data but keep informations from non-aggregated columns?

2 Answers2