0

What is the minimum DSSite_Dis for each SiteID within each year?

I have tried:

ds_DT <- ds_fish[ , .SD[which.min(DSSite_Dis)], by = c("SiteID", "year")]

however SiteID and year are different lengths.

I can't get my head around looping this by iterating through year within SiteID.which.min is a data.table function but happy to use something else.

10 Rep
  • 2,217
  • 7
  • 19
  • 33
Josh J
  • 395
  • 1
  • 3
  • 13
  • 1
    You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and the desired output for that input so that possible solutions can be tested and verified. – MrFlick Oct 26 '17 at 15:26

2 Answers2

3

SiteID and year aren't different lengths. It may well be the case that ds_fish[ , uniqueN(SiteID)] is different from ds_fish[ , uniqueN(year)], but by the nature of ds_fish being a data.table, it must be the case that length(ds_fish$SiteID) == length(ds_fish$year) (in turn both are equal to nrow(ds_fish)).

The code you provided is close to what you want. Here's what you want:

ds_fish[ , .(min_site_dis = min(DSSite_Dis)), by = .(SiteID, year)]

What you've done is return the full observation for each minimal DSSite_Dis -- this will return not only the minimal DSSite_Dis, but also the values of the remaining columns of ds_fish in the rows corresponding to these minima.

A conceptual note for you to chew on -- you phrased what you're after as "the minimum DSSite_Dis for each SiteID within each year"; you should convince yourself that this is the same as the minimum DISSite_Dis for each year within each SiteID.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
0

Original code works now.

Here it is in full

library(foreign)
library(data.table)

file <- 'DownstreamSites.dbf' # ds_fish dbf
ds_fish <- read.dbf(file, as.is = FALSE)
file <- 'UpstreamSites.dbf' # nearest_us_fish dbf
us_fish <- read.dbf(file, as.is = FALSE)
file <- 'barriers.dbf' # barriers dbf
barriers <- read.dbf(file, as.is = FALSE)
file <- 'metrics.dbf' # barriers dbf
metrics <- read.dbf(file, as.is = FALSE)
metrics <- metrics[c(-1:-3,-8:-34,-41,-42,-45:-47,-49:-52)]

colnames(metrics)[13] <- "DSSite_ID"
ds_fish1 <- merge(metrics, ds_fish, by = 'DSSite_ID')
colnames(metrics)[13] <- "USSite_ID"
us_fish1 <- merge(metrics, us_fish, by = 'USSite_ID')

year <- format(as.Date(us_fish1$event_date, format="%d/%m/%Y"),"%Y")
us_fish1$year <- year
year <- format(as.Date(ds_fish1$event_date, format="%d/%m/%Y"),"%Y")
ds_fish1$year <- year

ds_fish2 <- data.table(ds_fish1)
us_fish2 <- data.table(us_fish1)

ds_DT <- ds_fish2[ , .SD[which.min(DSSite_Dis)], by = c("SiteID", "year")]
us_DT <- us_fish2[ , .SD[which.min(USSite_Dis)], by = c("SiteID", "year")]
Josh J
  • 395
  • 1
  • 3
  • 13