0

Hope this makes sense: I am receiving data from colleagues in the form of csv files, each of which can be thousands of lines long. There are multiple columns in these files, but initially the 2 I am interested in are named "target" and "temperature". There are multiple categories for '"target" and within each catergory there can be many (or few) data points for "temperature". For example:

target       temperature
RSV          87.2
RSV          86.9
......
HSV          84.3
HSV          89.7

etc

Each target has its own defined temperature range so I need some way of defining these ranges, and then counting the number of samples for each target are within or outside the defined range.

Any and all suggestions gratefully received

neilfws
  • 32,751
  • 5
  • 50
  • 63
Lee
  • 1
  • What have you tried. What is your desired output. Please make this question [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – r2evans Mar 10 '17 at 03:13
  • 1
    See `?cut` or `?findInterval` to define your in-range values. – thelatemail Mar 10 '17 at 03:22

1 Answers1

1

The script calculates a range and then counting the number of samples for each target are within or outside the defined range

# data from colleagues
df <- data.frame(target=c("RSV", "RSV", "RSV", "RSV",
                          "HSV", "HSV", "HSV",
                          "SRV", "SRV", "SRV"),
                 temperature=c(87.2, 86.9, 86.8, 86.7,
                               84.3, 89.7, 88.7,
                               54.3, 59.7, 58.7))

# target with ranges
res <- data.frame(target=character(0),
                  min.temperature=numeric(0),
                  max.temperature=numeric(0),
                  within=numeric(0),
                  outside=numeric(0))

# targets
l <- levels(df$target)

for(i in 1:length(l)) {
  t <- df[df$target==l[i],]$temperature

  # some way of defining these ranges
  t.min <- min(t)
  t.max <- max(t)

  # targets in [min; max]
  in.range <- df$temperature >= t.min &
    df$temperature <= t.max

  t.within <- nrow(df[df$target==l[i] & in.range,])
  t.outside <- nrow(df[df$target==l[i] & !in.range,])

  res <- rbind(res, data.frame(target=l[i],
                     min.temperature=t.min,
                     max.temperature=t.max,
                     within=t.within,
                     outside=t.outside))
}

print(res)
#   target min.temperature max.temperature within outside
# 1    HSV            84.3            89.7      3       0
# 2    RSV            86.7            87.2      4       0
# 3    SRV            54.3            59.7      3       0